Validated Architectures
Many customers operate Coder in complex organizational environments, consistingof multiple business units, agencies, and/or subsidiaries. This can lead tonumerous Coder deployments, due to discrepancies in regulatory compliance, datasovereignty, and level of funding across groups. The Coder ValidatedArchitecture (CVA) prescribes a Kubernetes-based deployment approach, enablingyour organization to deploy a stable Coder instance that is easier to maintainand troubleshoot.
The following sections will detail the components of the Coder ValidatedArchitecture, provide guidance on how to configure and deploy these components,and offer insights into how to maintain and troubleshoot your Coder environment.
Who is this document for?
This guide targets the following personas. It assumes a basic understanding ofcloud/on-premise computing, containerization, and the Coder platform.
Role | Description |
---|---|
Platform Engineers | Responsible for deploying, operating the Coder deployment and infrastructure |
Enterprise Architects | Responsible for architecting Coder deployments to meet enterprise requirements |
Managed Service Providers | Entities that deploy and run Coder software as a service for customers |
CVA Guidance
CVA provides: | CVA does not provide: |
---|---|
Single and multi-region K8s deployment options | Prescribing OS, or cloud vs. on-premise |
Reference architectures for up to 3,000 users | An approval of your architecture; the CVA solely provides recommendations and guidelines |
Best practices for building a Coder deployment | Recommendations for every possible deployment scenario |
For higher level design principles and architectural best practices, see Coder'sWell-Architected Framework.
General concepts
This section outlines core concepts and terminology essential for understandingCoder's architecture and deployment strategies.
Administrator
An administrator is a user role within the Coder platform with elevatedprivileges. Admins have access to administrative functions such as usermanagement, template definitions, insights, and deployment configuration.
Coder control plane
Coder's control plane, also known ascoderd, is the main service recommendedfor deployment with multiple replicas to ensure high availability. It providesan API for managing workspaces and templates, and serves the dashboard UI. Inaddition, eachcoderd replica hosts 3 Terraformprovisionersby default.
User
Auser is an individual who utilizes the Coder platformto develop, test, and deploy applications using workspaces. Users can selectavailable templates to provision workspaces. They interact with Coder using theweb interface, the CLI tool, or directly calling API methods.
Workspace
Aworkspace refers to anisolated development environment where users can write, build, and run code.Workspaces are fully configurable and can be tailored to specific projectrequirements, providing developers with a consistent and efficient developmentenvironment. Workspaces can be autostarted and autostopped, enabling efficientresource management.
Users can connect to workspaces using SSH or via workspace applications likecode-server
, facilitating collaboration and remote access. Additionally,workspaces can be parameterized, allowing users to customize settings andconfigurations based on their unique needs. Workspaces are instantiated usingCoder templates and deployed on resources created by provisioners.
Template
Atemplate in Coder is a predefinedconfiguration for creating workspaces. Templates streamline the process ofworkspace creation by providing pre-configured settings, tooling, anddependencies. They are built by template administrators on top of Terraform,allowing for efficient management of infrastructure resources. Additionally,templates can utilize Coder modules to leverage existing features shared withother templates, enhancing flexibility and consistency across deployments.Templates describe provisioning rules for infrastructure resources offered byTerraform providers.
Workspace Proxy
Aworkspace proxy serves as arelay connection option for developers connecting to their workspace over SSH, aworkspace app, or through port forwarding. It helps reduce network latency forgeo-distributed teams by minimizing the distance network traffic needs totravel. Notably, workspace proxies do not handle dashboard connections or APIcalls.
Provisioner
Provisioners in Coder execute Terraform during workspace and template builds.While the platform includes built-in provisioner daemons by default, there areadvantages to employing external provisioners. These external daemons providesecure build environments and reduce server load, improving performance andscalability. Each provisioner can handle a single concurrent workspace build,allowing for efficient resource allocation and workload management.
Registry
TheCoder Registry is a platform where you canfind starter templates andModules for various cloud services and platforms.
Templates help create self-service development environments usingTerraform-defined infrastructure, whileModules simplify template creation byproviding common features like workspace applications, third-party integrations,or helper scripts.
Please note that the Registry is a hosted service and isn't available foroffline use.
Kubernetes Infrastructure
Kubernetes is the recommended, and supported platform for deploying Coder in theenterprise. It is the hosting platform of choice for a large majority of Coder'sFortune 500 customers, and it is the platform in which we build and test againsthere at Coder.
General recommendations
In general, it is recommended to deploy Coder into its own respective cluster,separate from production applications. Keep in mind that Coder runs developmentworkloads, so the cluster should be deployed as such, without production-levelconfigurations.
Compute
Deploy your Kubernetes cluster with two node groups, one for Coder's controlplane, and another for user workspaces (if you intend on leveraging K8s forend-user compute).
Control plane nodes
The Coder control plane node group must be static, to prevent scale down eventsfrom dropping pods, and thus dropping user connections to the dashboard UI andtheir workspaces.
Coder's Helm Chart supportsdefining nodeSelectors, affinities, and tolerationsto schedule the control plane pods on the appropriate node group.
Workspace nodes
Coder workspaces can be deployed either as Pods or Deployments in Kubernetes.See ourexample Kubernetes workspace template.Configure the workspace node group to be auto-scaling, to dynamically allocatecompute as users start/stop workspaces at the beginning and end of their day.Set nodeSelectors, affinities, and tolerations in Coder templates to assignworkspaces to the given node group:
resource "kubernetes_deployment" "coder" { spec { template { metadata { labels = { app = "coder-workspace" } } spec { affinity { pod_anti_affinity { preferred_during_scheduling_ignored_during_execution { weight = 1 pod_affinity_term { label_selector { match_expressions { key = "app.kubernetes.io/instance" operator = "In" values = ["coder-workspace"] } } topology_key = # add your node group label here } } } } tolerations { # Add your tolerations here } node_selector { # Add your node selectors here } container { image = "coder-workspace:latest" name = "dev" } } } }}
Node sizing
For sizing recommendations, see the below reference architectures:
AWS Instance Types
For production AWS deployments, we recommend using non-burstable instance types,such asm5
orc5
, instead of burstable instances, such ast3
.Burstable instances can experience significant performance degradation onceCPU credits are exhausted, leading to poor user experience under sustained load.
Component | Recommended Instance Type | Reason |
---|---|---|
coderd nodes | m5 | Balanced compute and memory for API and UI serving. |
Provisioner nodes | c5 | Compute-optimized performance for faster builds. |
Workspace nodes | m5 | Balanced performance for general development workloads. |
Database nodes | db.m5 | Consistent database performance for reliable operations. |
Networking
It is likely your enterprise deploys Kubernetes clusters with various networkingrestrictions. With this in mind, Coder requires the following connectivity:
- Egress from workspace compute to the Coder control plane pods
- Egress from control plane pods to Coder's PostgreSQL database
- Egress from control plane pods to git and package repositories
- Ingress from user devices to the control plane Load Balancer or Ingresscontroller
We recommend configuring your network policies in accordance with the above.Note that Coder workspaces do not require any ports to be open.
Storage
If running Coder workspaces as Kubernetes Pods or Deployments, you will need toassign persistent storage. We recommend leveraging asupported Container Storage Interface (CSI) driverin your cluster, with Dynamic Provisioning and read/write, to provide on-demandstorage to end-user workspaces.
The following Kubernetes volume types have been validated by Coder internally,and/or by our customers:
Ourexample Kubernetes workspace templateprovisions a PersistentVolumeClaim block storage device, attached to theDeployment.
It is not recommended to mount volumes from the host node(s) into workspaces,for security and reliability purposes. The below volume types arenotrecommended for use with Coder:
Not that Coder's control plane filesystem is ephemeral, so no persistent storageis required.
PostgreSQL database
Coder requires access to an external PostgreSQL database to store user data,workspace state, template files, and more. Depending on the scale of theuser-base, workspace activity, and High Availability requirements, the amount ofCPU and memory resources required by Coder's database may differ.
Disaster recovery
Prepare internal scripts for dumping and restoring your database. We recommendscheduling regular database backups, especially before upgrading Coder to a newrelease. Coder does not support downgrades without initially restoring thedatabase to the prior version.
Performance efficiency
We highly recommend deploying the PostgreSQL instance in the same region (and ifpossible, same availability zone) as the Coder server to optimize for lowlatency connections. We recommend keeping latency under 10ms between the Coderserver and database.
When determining scaling requirements, take into account the followingconsiderations:
2 vCPU x 8 GB RAM x 512 GB storage
: A baseline for database requirements forCoder deployment with less than 1000 users, and low activity level (30% activeusers). This capacity should be sufficient to support 100 externalprovisioners.- Storage size depends on user activity, workspace builds, log verbosity,overhead on database encryption, etc.
- Allocate two additional CPU core to the database instance for every 1000active users.
- Enable High Availability mode for database engine for large scale deployments.
Recommended instance types by cloud provider
For production deployments, we recommend using dedicated compute instances rather than burstable instances (like AWS t-family) which provide inconsistent CPU performance. Below are recommended instance types for each major cloud provider:
AWS (RDS/Aurora PostgreSQL)
- Small deployments (<1000 users):
db.m6i.large
(2 vCPU, 8 GB RAM) ordb.r6i.large
(2 vCPU, 16 GB RAM) - Medium deployments (1000-2000 users):
db.m6i.xlarge
(4 vCPU, 16 GB RAM) ordb.r6i.xlarge
(4 vCPU, 32 GB RAM) - Large deployments (2000+ users):
db.m6i.2xlarge
(8 vCPU, 32 GB RAM) ordb.r6i.2xlarge
(8 vCPU, 64 GB RAM)
Azure (Azure Database for PostgreSQL)
- Small deployments (<1000 users):
Standard_D2s_v5
(2 vCPU, 8 GB RAM) orStandard_E2s_v5
(2 vCPU, 16 GB RAM) - Medium deployments (1000-2000 users):
Standard_D4s_v5
(4 vCPU, 16 GB RAM) orStandard_E4s_v5
(4 vCPU, 32 GB RAM) - Large deployments (2000+ users):
Standard_D8s_v5
(8 vCPU, 32 GB RAM) orStandard_E8s_v5
(8 vCPU, 64 GB RAM)
Google Cloud (Cloud SQL for PostgreSQL)
- Small deployments (<1000 users):
db-perf-optimized-N-2
(2 vCPU, 16 GB RAM) - Medium deployments (1000-2000 users):
db-perf-optimized-N-4
(4 vCPU, 32 GB RAM) - Large deployments (2000+ users):
db-perf-optimized-N-8
(8 vCPU, 64 GB RAM)
Storage recommendations
For optimal database performance, use the following storage types:
AWS RDS/Aurora: Use
gp3
(General Purpose SSD) volumes with at least 3,000 IOPS for production workloads. For high-performance requirements, considerio1
orio2
volumes with provisioned IOPS.Azure Database for PostgreSQL: Use Premium SSD (P-series) with appropriate IOPS and throughput provisioning. Standard SSD can be used for development/test environments.
Google Cloud SQL: Use SSD persistent disks for production workloads. Standard (HDD) persistent disks are suitable only for development or low-performance requirements.
If you enabledatabase encryption in Coder,consider allocating an additional CPU core to everycoderd
replica.
Resource utilization guidelines
Below are general recommendations for sizing your PostgreSQL instance:
- Increase number of vCPU if CPU utilization or database latency is high.
- Allocate extra memory if database performance is poor, CPU utilization is low,and memory utilization is high.
- Utilize faster disk options (higher IOPS) such as SSDs or NVMe drives foroptimal performance enhancement and possibly reduce database load.
Operational readiness
Operational readiness in Coder is about ensuring that everything is set upcorrectly before launching a platform into production. It involves making surethat the service is reliable, secure, and easily scales accordingly to user-baseneeds. Operational readiness is crucial because it helps prevent issues thatcould affect workspace users experience once the platform is live.
Helm Chart Configuration
- Reference ourHelm chart values fileand identify the required values for deployment.
- Create a
values.yaml
and add it to your version control system. - Determine the necessary environment variables. Here is thefull list of supported server environment variables.
- Follow our documentedsteps for installing Coder via Helm.
Template configuration
- Establish dedicated accounts for users with theTemplate Administratorrole.
- Maintain Coder templates usingversion control.
- Consider implementing a GitOps workflow to automatically push new templateversions into Coder from git. For example, on GitHub, you can use theSetup Coder action.
- Evaluate enablingautomatic template updatesupon workspace startup.
Observability
- Enable the Prometheus endpoint (environment variable:
CODER_PROMETHEUS_ENABLE
). - Deploy theCoder Observability bundle toleverage pre-configured dashboards, alerts, and runbooks for monitoringCoder. This includes integrations between Prometheus, Grafana, Loki, andAlertmanager.
- Review thePrometheus response and set upalarms on selected metrics.
User support
- Incorporatesupport links intointernal documentation accessible from the user context menu. Ensure thathyperlinks are valid and lead to up-to-date materials.
- Encourage the use of
coder support bundle
to allow workspace users togenerate and provide network-related diagnostic data.