HomeAdministrationInfrastructureValidated Architectures

Validated Architectures

Many customers operate Coder in complex organizational environments, consistingof multiple business units, agencies, and/or subsidiaries. This can lead tonumerous Coder deployments, due to discrepancies in regulatory compliance, datasovereignty, and level of funding across groups. The Coder ValidatedArchitecture (CVA) prescribes a Kubernetes-based deployment approach, enablingyour organization to deploy a stable Coder instance that is easier to maintainand troubleshoot.

The following sections will detail the components of the Coder ValidatedArchitecture, provide guidance on how to configure and deploy these components,and offer insights into how to maintain and troubleshoot your Coder environment.

Who is this document for?

This guide targets the following personas. It assumes a basic understanding ofcloud/on-premise computing, containerization, and the Coder platform.

Role	Description
Platform Engineers	Responsible for deploying, operating the Coder deployment and infrastructure
Enterprise Architects	Responsible for architecting Coder deployments to meet enterprise requirements
Managed Service Providers	Entities that deploy and run Coder software as a service for customers

CVA Guidance

CVA provides:	CVA does not provide:
Single and multi-region K8s deployment options	Prescribing OS, or cloud vs. on-premise
Reference architectures for up to 3,000 users	An approval of your architecture; the CVA solely provides recommendations and guidelines
Best practices for building a Coder deployment	Recommendations for every possible deployment scenario

For higher level design principles and architectural best practices, see Coder'sWell-Architected Framework.

General concepts

This section outlines core concepts and terminology essential for understandingCoder's architecture and deployment strategies.

Administrator

An administrator is a user role within the Coder platform with elevatedprivileges. Admins have access to administrative functions such as usermanagement, template definitions, insights, and deployment configuration.

Coder control plane

Coder's control plane, also known ascoderd, is the main service recommendedfor deployment with multiple replicas to ensure high availability. It providesan API for managing workspaces and templates, and serves the dashboard UI. Inaddition, eachcoderd replica hosts 3 Terraformprovisionersby default.

User

Auser is an individual who utilizes the Coder platformto develop, test, and deploy applications using workspaces. Users can selectavailable templates to provision workspaces. They interact with Coder using theweb interface, the CLI tool, or directly calling API methods.

Workspace

Aworkspace refers to anisolated development environment where users can write, build, and run code.Workspaces are fully configurable and can be tailored to specific projectrequirements, providing developers with a consistent and efficient developmentenvironment. Workspaces can be autostarted and autostopped, enabling efficientresource management.

Users can connect to workspaces using SSH or via workspace applications likecode-server, facilitating collaboration and remote access. Additionally,workspaces can be parameterized, allowing users to customize settings andconfigurations based on their unique needs. Workspaces are instantiated usingCoder templates and deployed on resources created by provisioners.

Template

Atemplate in Coder is a predefinedconfiguration for creating workspaces. Templates streamline the process ofworkspace creation by providing pre-configured settings, tooling, anddependencies. They are built by template administrators on top of Terraform,allowing for efficient management of infrastructure resources. Additionally,templates can utilize Coder modules to leverage existing features shared withother templates, enhancing flexibility and consistency across deployments.Templates describe provisioning rules for infrastructure resources offered byTerraform providers.

Workspace Proxy

Aworkspace proxy serves as arelay connection option for developers connecting to their workspace over SSH, aworkspace app, or through port forwarding. It helps reduce network latency forgeo-distributed teams by minimizing the distance network traffic needs totravel. Notably, workspace proxies do not handle dashboard connections or APIcalls.

Provisioner

Provisioners in Coder execute Terraform during workspace and template builds.While the platform includes built-in provisioner daemons by default, there areadvantages to employing external provisioners. These external daemons providesecure build environments and reduce server load, improving performance andscalability. Each provisioner can handle a single concurrent workspace build,allowing for efficient resource allocation and workload management.

Registry

TheCoder Registry is a platform where you canfind starter templates andModules for various cloud services and platforms.

Templates help create self-service development environments usingTerraform-defined infrastructure, whileModules simplify template creation byproviding common features like workspace applications, third-party integrations,or helper scripts.

Please note that the Registry is a hosted service and isn't available foroffline use.

Kubernetes Infrastructure

Kubernetes is the recommended, and supported platform for deploying Coder in theenterprise. It is the hosting platform of choice for a large majority of Coder'sFortune 500 customers, and it is the platform in which we build and test againsthere at Coder.

General recommendations

In general, it is recommended to deploy Coder into its own respective cluster,separate from production applications. Keep in mind that Coder runs developmentworkloads, so the cluster should be deployed as such, without production-levelconfigurations.

Compute

Deploy your Kubernetes cluster with two node groups, one for Coder's controlplane, and another for user workspaces (if you intend on leveraging K8s forend-user compute).

Control plane nodes

The Coder control plane node group must be static, to prevent scale down eventsfrom dropping pods, and thus dropping user connections to the dashboard UI andtheir workspaces.

Coder's Helm Chart supportsdefining nodeSelectors, affinities, and tolerationsto schedule the control plane pods on the appropriate node group.

Workspace nodes

Coder workspaces can be deployed either as Pods or Deployments in Kubernetes.See ourexample Kubernetes workspace template.Configure the workspace node group to be auto-scaling, to dynamically allocatecompute as users start/stop workspaces at the beginning and end of their day.Set nodeSelectors, affinities, and tolerations in Coder templates to assignworkspaces to the given node group:

resource "kubernetes_deployment" "coder" {  spec {    template {      metadata {        labels = {          app = "coder-workspace"        }      }      spec {        affinity {          pod_anti_affinity {            preferred_during_scheduling_ignored_during_execution {              weight = 1              pod_affinity_term {                label_selector {                  match_expressions {                    key      = "app.kubernetes.io/instance"                    operator = "In"                    values   = ["coder-workspace"]                  }                }                topology_key = # add your node group label here              }            }          }        }        tolerations {          # Add your tolerations here        }        node_selector {          # Add your node selectors here        }        container {          image = "coder-workspace:latest"          name  = "dev"        }      }    }  }}

Node sizing

For sizing recommendations, see the below reference architectures:

AWS Instance Types

For production AWS deployments, we recommend using non-burstable instance types,such asm5 orc5, instead of burstable instances, such ast3.Burstable instances can experience significant performance degradation onceCPU credits are exhausted, leading to poor user experience under sustained load.

Component	Recommended Instance Type	Reason
coderd nodes	`m5`	Balanced compute and memory for API and UI serving.
Provisioner nodes	`c5`	Compute-optimized performance for faster builds.
Workspace nodes	`m5`	Balanced performance for general development workloads.
Database nodes	`db.m5`	Consistent database performance for reliable operations.

Networking

It is likely your enterprise deploys Kubernetes clusters with various networkingrestrictions. With this in mind, Coder requires the following connectivity:

Egress from workspace compute to the Coder control plane pods
Egress from control plane pods to Coder's PostgreSQL database
Egress from control plane pods to git and package repositories
Ingress from user devices to the control plane Load Balancer or Ingresscontroller

We recommend configuring your network policies in accordance with the above.Note that Coder workspaces do not require any ports to be open.

Storage

If running Coder workspaces as Kubernetes Pods or Deployments, you will need toassign persistent storage. We recommend leveraging asupported Container Storage Interface (CSI) driverin your cluster, with Dynamic Provisioning and read/write, to provide on-demandstorage to end-user workspaces.

The following Kubernetes volume types have been validated by Coder internally,and/or by our customers:

Ourexample Kubernetes workspace templateprovisions a PersistentVolumeClaim block storage device, attached to theDeployment.

It is not recommended to mount volumes from the host node(s) into workspaces,for security and reliability purposes. The below volume types arenotrecommended for use with Coder:

Not that Coder's control plane filesystem is ephemeral, so no persistent storageis required.

PostgreSQL database

Coder requires access to an external PostgreSQL database to store user data,workspace state, template files, and more. Depending on the scale of theuser-base, workspace activity, and High Availability requirements, the amount ofCPU and memory resources required by Coder's database may differ.

Disaster recovery

Prepare internal scripts for dumping and restoring your database. We recommendscheduling regular database backups, especially before upgrading Coder to a newrelease. Coder does not support downgrades without initially restoring thedatabase to the prior version.

Performance efficiency

We highly recommend deploying the PostgreSQL instance in the same region (and ifpossible, same availability zone) as the Coder server to optimize for lowlatency connections. We recommend keeping latency under 10ms between the Coderserver and database.

When determining scaling requirements, take into account the followingconsiderations:

2 vCPU x 8 GB RAM x 512 GB storage: A baseline for database requirements forCoder deployment with less than 1000 users, and low activity level (30% activeusers). This capacity should be sufficient to support 100 externalprovisioners.
Storage size depends on user activity, workspace builds, log verbosity,overhead on database encryption, etc.
Allocate two additional CPU core to the database instance for every 1000active users.
Enable High Availability mode for database engine for large scale deployments.

Recommended instance types by cloud provider

For production deployments, we recommend using dedicated compute instances rather than burstable instances (like AWS t-family) which provide inconsistent CPU performance. Below are recommended instance types for each major cloud provider:

AWS (RDS/Aurora PostgreSQL)

Small deployments (<1000 users):db.m6i.large (2 vCPU, 8 GB RAM) ordb.r6i.large (2 vCPU, 16 GB RAM)
Medium deployments (1000-2000 users):db.m6i.xlarge (4 vCPU, 16 GB RAM) ordb.r6i.xlarge (4 vCPU, 32 GB RAM)
Large deployments (2000+ users):db.m6i.2xlarge (8 vCPU, 32 GB RAM) ordb.r6i.2xlarge (8 vCPU, 64 GB RAM)

Comparison

Azure (Azure Database for PostgreSQL)

Small deployments (<1000 users):Standard_D2s_v5 (2 vCPU, 8 GB RAM) orStandard_E2s_v5 (2 vCPU, 16 GB RAM)
Medium deployments (1000-2000 users):Standard_D4s_v5 (4 vCPU, 16 GB RAM) orStandard_E4s_v5 (4 vCPU, 32 GB RAM)
Large deployments (2000+ users):Standard_D8s_v5 (8 vCPU, 32 GB RAM) orStandard_E8s_v5 (8 vCPU, 64 GB RAM)

Comparison

Google Cloud (Cloud SQL for PostgreSQL)

Small deployments (<1000 users):db-perf-optimized-N-2 (2 vCPU, 16 GB RAM)
Medium deployments (1000-2000 users):db-perf-optimized-N-4 (4 vCPU, 32 GB RAM)
Large deployments (2000+ users):db-perf-optimized-N-8 (8 vCPU, 64 GB RAM)

Comparison

Storage recommendations

For optimal database performance, use the following storage types:

AWS RDS/Aurora: Usegp3 (General Purpose SSD) volumes with at least 3,000 IOPS for production workloads. For high-performance requirements, considerio1 orio2 volumes with provisioned IOPS.
Azure Database for PostgreSQL: Use Premium SSD (P-series) with appropriate IOPS and throughput provisioning. Standard SSD can be used for development/test environments.
Google Cloud SQL: Use SSD persistent disks for production workloads. Standard (HDD) persistent disks are suitable only for development or low-performance requirements.

If you enabledatabase encryption in Coder,consider allocating an additional CPU core to everycoderd replica.

Resource utilization guidelines

Below are general recommendations for sizing your PostgreSQL instance:

Increase number of vCPU if CPU utilization or database latency is high.
Allocate extra memory if database performance is poor, CPU utilization is low,and memory utilization is high.
Utilize faster disk options (higher IOPS) such as SSDs or NVMe drives foroptimal performance enhancement and possibly reduce database load.

Operational readiness

Operational readiness in Coder is about ensuring that everything is set upcorrectly before launching a platform into production. It involves making surethat the service is reliable, secure, and easily scales accordingly to user-baseneeds. Operational readiness is crucial because it helps prevent issues thatcould affect workspace users experience once the platform is live.

Helm Chart Configuration

Reference ourHelm chart values fileand identify the required values for deployment.
Create avalues.yaml and add it to your version control system.
Determine the necessary environment variables. Here is thefull list of supported server environment variables.
Follow our documentedsteps for installing Coder via Helm.

Template configuration

Establish dedicated accounts for users with theTemplate Administratorrole.
Maintain Coder templates usingversion control.
Consider implementing a GitOps workflow to automatically push new templateversions into Coder from git. For example, on GitHub, you can use theSetup Coder action.
Evaluate enablingautomatic template updatesupon workspace startup.

Observability

Enable the Prometheus endpoint (environment variable:CODER_PROMETHEUS_ENABLE).
Deploy theCoder Observability bundle toleverage pre-configured dashboards, alerts, and runbooks for monitoringCoder. This includes integrations between Prometheus, Grafana, Loki, andAlertmanager.
Review thePrometheus response and set upalarms on selected metrics.

User support

Incorporatesupport links intointernal documentation accessible from the user context menu. Ensure thathyperlinks are valid and lead to up-to-date materials.
Encourage the use ofcoder support bundle to allow workspace users togenerate and provide network-related diagnostic data.

Movatterモバイル変換

Validated Architectures

Who is this document for?

CVA Guidance

General concepts

Administrator

Coder control plane

User

Workspace

Template

Workspace Proxy

Provisioner

Registry

Kubernetes Infrastructure

General recommendations

Compute

Control plane nodes

Workspace nodes

Node sizing

AWS Instance Types

Networking

Storage

PostgreSQL database

Disaster recovery

Performance efficiency

Recommended instance types by cloud provider

AWS (RDS/Aurora PostgreSQL)

Azure (Azure Database for PostgreSQL)

Google Cloud (Cloud SQL for PostgreSQL)

Storage recommendations

Resource utilization guidelines

Operational readiness

Helm Chart Configuration

Template configuration

Observability

User support

On this page