@@ -1,90 +1,4 @@ # Reference Architectures This document provides prescriptive solutions and reference architectures to support successful deployments of up to 3000 users and outlines at a high-level the methodology currently used to scale-test Coder. ## General concepts This section outlines core concepts and terminology essential for understanding Coder's architecture and deployment strategies. ### Administrator An administrator is a user role within the Coder platform with elevated privileges. Admins have access to administrative functions such as user management, template definitions, insights, and deployment configuration. ### Coder Coder, also known as _coderd_, is the main service recommended for deployment with multiple replicas to ensure high availability. It provides an API for managing workspaces and templates. Each _coderd_ replica has the capability to host multiple [provisioners](#provisioner). ### User A user is an individual who utilizes the Coder platform to develop, test, and deploy applications using workspaces. Users can select available templates to provision workspaces. They interact with Coder using the web interface, the CLI tool, or directly calling API methods. ### Workspace A workspace refers to an isolated development environment where users can write, build, and run code. Workspaces are fully configurable and can be tailored to specific project requirements, providing developers with a consistent and efficient development environment. Workspaces can be autostarted and autostopped, enabling efficient resource management. Users can connect to workspaces using SSH or via workspace applications like `code-server`, facilitating collaboration and remote access. Additionally, workspaces can be parameterized, allowing users to customize settings and configurations based on their unique needs. Workspaces are instantiated using Coder templates and deployed on resources created by provisioners. ### Template A template in Coder is a predefined configuration for creating workspaces. Templates streamline the process of workspace creation by providing pre-configured settings, tooling, and dependencies. They are built by template administrators on top of Terraform, allowing for efficient management of infrastructure resources. Additionally, templates can utilize Coder modules to leverage existing features shared with other templates, enhancing flexibility and consistency across deployments. Templates describe provisioning rules for infrastructure resources offered by Terraform providers. ### Workspace Proxy A workspace proxy serves as a relay connection option for developers connecting to their workspace over SSH, a workspace app, or through port forwarding. It helps reduce network latency for geo-distributed teams by minimizing the distance network traffic needs to travel. Notably, workspace proxies do not handle dashboard connections or API calls. ### Provisioner Provisioners in Coder execute Terraform during workspace and template builds. While the platform includes built-in provisioner daemons by default, there are advantages to employing external provisioners. These external daemons provide secure build environments and reduce server load, improving performance and scalability. Each provisioner can handle a single concurrent workspace build, allowing for efficient resource allocation and workload management. ### Registry The Coder Registry is a platform where you can find starter templates and _Modules_ for various cloud services and platforms. Templates help create self-service development environments using Terraform-defined infrastructure, while _Modules_ simplify template creation by providing common features like workspace applications, third-party integrations, or helper scripts. Please note that the Registry is a hosted service and isn't available for offline use. ## Scale-testing methodology ## Scale Testing Scaling Coder involves planning and testing to ensure it can handle more load without compromising service. This process encompasses infrastructure setup, Expand All @@ -95,7 +9,7 @@ A dedicated Kubernetes cluster for Coder is Kubernetes cluster specifically configured to host and manage Coder workloads. Kubernetes provides container orchestration capabilities, allowing Coder to efficiently deploy, scale, and manage workspaces across a distributed infrastructure. This ensures high availability, fault tolerance, and scalability for Coder deployments.Code is availability, fault tolerance, and scalability for Coder deployments.Coder is deployed on this cluster using the [Helm chart](../../install/kubernetes.md#install-coder-with-helm). Expand Down Expand Up @@ -315,96 +229,3 @@ Scaling down workspace nodes to zero is not recommended, as it will result in longer wait times for workspace provisioning by users. However, this may be necessary for workspaces with special resource requirements (e.g. GPUs) that incur significant cost overheads. ### Data plane: External database While running in production, Coder requires a access to an external PostgreSQL database. Depending on the scale of the user-base, workspace activity, and High Availability requirements, the amount of CPU and memory resources required by Coder's database may differ. #### Scaling formula When determining scaling requirements, take into account the following considerations: - `2 vCPU x 8 GB RAM x 512 GB storage`: A baseline for database requirements for Coder deployment with less than 1000 users, and low activity level (30% active users). This capacity should be sufficient to support 100 external provisioners. - Storage size depends on user activity, workspace builds, log verbosity, overhead on database encryption, etc. - Allocate two additional CPU core to the database instance for every 1000 active users. - Enable _High Availability_ mode for database engine for large scale deployments. If you enable [database encryption](../encryption.md) in Coder, consider allocating an additional CPU core to every `coderd` replica. #### Performance optimization guidelines We provide the following general recommendations for PostgreSQL settings: - Increase number of vCPU if CPU utilization or database latency is high. - Allocate extra memory if database performance is poor, CPU utilization is low, and memory utilization is high. - Utilize faster disk options (higher IOPS) such as SSDs or NVMe drives for optimal performance enhancement and possibly reduce database load. ## Operational readiness Operational readiness in Coder is about ensuring that everything is set up correctly before launching a platform into production. It involves making sure that the service is reliable, secure, and easily scales accordingly to user-base needs. Operational readiness is crucial because it helps prevent issues that could affect workspace users experience once the platform is live. Learn about Coder design principles and architectural best practices described in the [Well-Architected Framework](https://coder.com/blog/coder-well-architected-framework). ### Configuration 1. Identify the required Helm values for configuration. 1. Create `values.yaml` and add it to a version control system. _Note:_ it is highly recommended that you create a custom `values.yaml` as opposed to copying the entire default values. 1. Determine the necessary environment variables. ### Template configuration 1. Establish a dedicated user account for the _Template Administrator_. 1. Maintain Coder templates using version control. 1. Consider implementing a GitOps workflow to automatically push new template. For example, on Github, you can use the [Update Coder Template](https://github.com/marketplace/actions/update-coder-template) action. 1. Evaluate enabling automatic template updates upon workspace startup. ### Deployment 1. Leverage automation tooling to automate deployment and upgrades of Coder. ### Observability 1. Enable the Prometheus endpoint (environment variable: `CODER_PROMETHEUS_ENABLE`). 1. Deploy a visual monitoring system such as Grafana for metrics visualization. 1. Deploy a centralized logs aggregation solution to collect and monitor application logs. 1. Review the [Prometheus response](../prometheus.md) and set up alarms on selected metrics. ### Database backups 1. Prepare internal scripts for dumping and restoring databases. 1. Schedule regular database backups, especially before release upgrades. ### User support 1. Incorporate [support links](../appearance.md#support-links) into internal documentation accessible from the user context menu. Ensure that hyperlinks are valid and lead to up-to-date materials. 1. Encourage the use of `coder support bundle` to allow workspace users to generate and provide network-related diagnostic data.