Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

Commitfb7f1ac

Browse files
authored
docs: update reference architecture: glossary, scale tests methodology (#12438)
1 parent8427998 commitfb7f1ac

File tree

1 file changed

+173
-0
lines changed

1 file changed

+173
-0
lines changed

‎docs/admin/reference-architectures.md

Lines changed: 173 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,173 @@
1+
#Reference architectures
2+
3+
This document provides prescriptive solutions and reference architectures to
4+
support successful deployments of up to 2000 users and outlines at a high-level
5+
the methodology currently used to scale-test Coder.
6+
7+
##General concepts
8+
9+
This section outlines core concepts and terminology essential for understanding
10+
Coder's architecture and deployment strategies.
11+
12+
###Administrator
13+
14+
An administrator is a user role within the Coder platform with elevated
15+
privileges. Admins have access to administrative functions such as user
16+
management, template definitions, insights, and deployment configuration.
17+
18+
###Coder
19+
20+
Coder, also known as_coderd_, is the main service recommended for deployment
21+
with multiple replicas to ensure high availability. It provides an API for
22+
managing workspaces and templates. Each_coderd_ replica has the capability to
23+
host multiple[provisioners](#provisioner).
24+
25+
###User
26+
27+
A user is an individual who utilizes the Coder platform to develop, test, and
28+
deploy applications using workspaces. Users can select available templates to
29+
provision workspaces. They interact with Coder using the web interface, the CLI
30+
tool, or directly calling API methods.
31+
32+
###Workspace
33+
34+
A workspace refers to an isolated development environment where users can write,
35+
build, and run code. Workspaces are fully configurable and can be tailored to
36+
specific project requirements, providing developers with a consistent and
37+
efficient development environment. Workspaces can be autostarted and
38+
autostopped, enabling efficient resource management.
39+
40+
Users can connect to workspaces using SSH or via workspace applications like
41+
`code-server`, facilitating collaboration and remote access. Additionally,
42+
workspaces can be parameterized, allowing users to customize settings and
43+
configurations based on their unique needs. Workspaces are instantiated using
44+
Coder templates and deployed on resources created by provisioners.
45+
46+
###Template
47+
48+
A template in Coder is a predefined configuration for creating workspaces.
49+
Templates streamline the process of workspace creation by providing
50+
pre-configured settings, tooling, and dependencies. They are built by template
51+
administrators on top of Terraform, allowing for efficient management of
52+
infrastructure resources. Additionally, templates can utilize Coder modules to
53+
leverage existing features shared with other templates, enhancing flexibility
54+
and consistency across deployments. Templates describe provisioning rules for
55+
infrastructure resources offered by Terraform providers.
56+
57+
###Workspace Proxy
58+
59+
A workspace proxy serves as a relay connection option for developers connecting
60+
to their workspace over SSH, a workspace app, or through port forwarding. It
61+
helps reduce network latency for geo-distributed teams by minimizing the
62+
distance network traffic needs to travel. Notably, workspace proxies do not
63+
handle dashboard connections or API calls.
64+
65+
###Provisioner
66+
67+
Provisioners in Coder execute Terraform during workspace and template builds.
68+
While the platform includes built-in provisioner daemons by default, there are
69+
advantages to employing external provisioners. These external daemons provide
70+
secure build environments and reduce server load, improving performance and
71+
scalability. Each provisioner can handle a single concurrent workspace build,
72+
allowing for efficient resource allocation and workload management.
73+
74+
###Registry
75+
76+
The Coder Registry is a platform where you can find starter templates and
77+
_Modules_ for various cloud services and platforms.
78+
79+
Templates help create self-service development environments using
80+
Terraform-defined infrastructure, while_Modules_ simplify template creation by
81+
providing common features like workspace applications, third-party integrations,
82+
or helper scripts.
83+
84+
Please note that the Registry is a hosted service and isn't available for
85+
offline use.
86+
87+
##Scale-testing methodology
88+
89+
Scaling Coder involves planning and testing to ensure it can handle more load
90+
without compromising service. This process encompasses infrastructure setup,
91+
traffic projections, and aggressive testing to identify and mitigate potential
92+
bottlenecks.
93+
94+
A dedicated Kubernetes cluster for Coder is Kubernetes cluster specifically
95+
configured to host and manage Coder workloads. Kubernetes provides container
96+
orchestration capabilities, allowing Coder to efficiently deploy, scale, and
97+
manage workspaces across a distributed infrastructure. This ensures high
98+
availability, fault tolerance, and scalability for Coder deployments. Code is
99+
deployed on this cluster using the
100+
[Helm chart](../install/kubernetes#install-coder-with-helm).
101+
102+
Our scale tests include the following stages:
103+
104+
1. Prepare environment: create expected users and provision workspaces.
105+
106+
2. SSH connections: establish user connections with agents, verifying their
107+
ability to echo back received content.
108+
109+
3. Web Terminal: verify the PTY connection used for communication with Web
110+
Terminal.
111+
112+
4. Workspace application traffic: assess the handling of user connections with
113+
specific workspace apps, confirming their capability to echo back received
114+
content effectively.
115+
116+
5. Dashboard evaluation: verify the responsiveness and stability of Coder
117+
dashboards under varying load conditions. This is achieved by simulating user
118+
interactions using instances of headless Chromium browsers.
119+
120+
6. Cleanup: delete workspaces and users created in step 1.
121+
122+
###Infrastructure and setup requirements
123+
124+
The scale tests runner can distribute the workload to overlap single scenarios
125+
based on the workflow configuration:
126+
127+
|| T0| T1| T2| T3| T4| T5| T6|
128+
| --------------------| ---| ---| ---| ---| ---| ---| ---|
129+
| SSH connections| X| X| X| X||||
130+
| Web Terminal (PTY)|| X| X| X| X|||
131+
| Workspace apps||| X| X| X| X||
132+
| Dashboard (headless)|||| X| X| X| X|
133+
134+
This pattern closely reflects how our customers naturally use the system. SSH
135+
connections are heavily utilized because they're the primary communication
136+
channel for IDEs with VS Code and JetBrains plugins.
137+
138+
The basic setup of scale tests environment involves:
139+
140+
1. Scale tests runner (32 vCPU, 128 GB RAM)
141+
2. Coder: 2 replicas (4 vCPU, 16 GB RAM)
142+
3. Database: 1 instance (2 vCPU, 32 GB RAM)
143+
4. Provisioner: 50 instances (0.5 vCPU, 512 MB RAM)
144+
145+
The test is deemed successful if users did not experience interruptions in their
146+
workflows,`coderd` did not crash or require restarts, and no other internal
147+
errors were observed.
148+
149+
###Traffic Projections
150+
151+
In our scale tests, we simulate activity from 2000 users, 2000 workspaces, and
152+
2000 agents, with two items of workspace agent metadata being sent every 10
153+
seconds. Here are the resulting metrics:
154+
155+
Coder:
156+
157+
- Median CPU usage for_coderd_: 3 vCPU, peaking at 3.7 vCPU during dashboard
158+
tests.
159+
- Median API request rate: 350 req/s during dashboard tests, 250 req/s during
160+
Web Terminal and workspace apps tests.
161+
- 2000 agent API connections with latency: p90 at 60 ms, p95 at 220 ms.
162+
- on average 2400 Web Socket connections during dashboard tests.
163+
164+
Provisionerd:
165+
166+
- Median CPU usage is 0.35 vCPU during workspace provisioning.
167+
168+
Database:
169+
170+
- Median CPU utilization is 80%, with a significant portion dedicated to writing
171+
metadata.
172+
- Memory utilization averages at 40%.
173+
-`write_ops_count` between 6.7 and 8.4 operations per second.

0 commit comments

Comments
 (0)

[8]ページ先頭

©2009-2025 Movatter.jp