Issue TitleHow to create a GPU-enabled development environment with Coder using Docker provider DescriptionI'm trying to create a Coder template that provides a GPU-accelerated development environment for machine learning work. I want to build a custom Docker image with CUDA support and have it accessible through Coder's web interface. Environment- OS: Ubuntu 20.04
- GPU: NVIDIA RTX 4090
- Docker: GPU support verified working
- Coder: 2.24
- Terraform: 1.12.2
GPU Environment VerificationI've confirmed that my Docker + GPU setup is working correctly: docker run --gpus all --rm pytorch/manylinux-cuda118:latest nvidia-smi This command successfully shows GPU information, confirming that: - NVIDIA Docker runtime is properly configured
- GPU passthrough to containers works
- CUDA drivers are accessible from within containers
Create a Coder template that builds a custom Docker image with: - NVIDIA CUDA 11.8 support
- Python development environment with PyTorch, Jupyter Lab, etc.
- GPU monitoring and resource tracking
- VS Code and JetBrains IDE integration
The workspace should: - Have GPU access (
nvidia-smi should work inside the workspace) - Provide web access to VS Code
Current IssuesI'm encountering several challenges: - When I instantiated the template I built into a workspace, I found that the instance did not have a GPU.
Are there any existing examples or community templates for GPU-enabled Coder workspaces that I could reference? Any help, examples, or guidance would be greatly appreciated! terraform { required_providers { coder = { source = "coder/coder" } docker = { source = "kreuzwerker/docker" } }}locals { username = data.coder_workspace_owner.me.name}variable "docker_socket" { default = "" description = "(Optional) Docker socket URI" type = string}variable "gpu_enabled" { default = true description = "Enable GPU support for the workspace" type = bool}variable "gpu_count" { default = "all" description = "Number of GPUs to allocate (use 'all' for all GPUs, or specify device IDs like '0,1')" type = string}provider "docker" { # Defaulting to null if the variable is an empty string lets us have an optional variable without having to set our own default host = var.docker_socket != "" ? var.docker_socket : null}data "coder_provisioner" "me" {}data "coder_workspace" "me" {}data "coder_workspace_owner" "me" {}resource "coder_agent" "main" { arch = data.coder_provisioner.me.arch os = "linux" startup_script = <<-EOT set -e # Create coder user if it doesn't exist if ! id "coder" &>/dev/null; then useradd --create-home --shell=/bin/bash --groups=sudo coder echo "coder ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/90-coder fi # Ensure coder user owns the home directory chown -R coder:coder /home/coder # Switch to coder user for the rest of the setup sudo -u coder bash << 'EOF' # Prepare user home with default files on first start. if [ ! -f ~/.init_done ]; then # Create basic shell configuration echo 'export PATH=$PATH:/usr/local/bin' >> ~/.bashrc echo 'alias ll="ls -la"' >> ~/.bashrc # Check GPU availability if command -v nvidia-smi &> /dev/null; then echo "GPU detected:" nvidia-smi echo 'export CUDA_VISIBLE_DEVICES=all' >> ~/.bashrc else echo "No GPU detected or nvidia-smi not available" fi # Install basic Python packages if command -v pip &> /dev/null; then pip install --user jupyter notebook ipython fi touch ~/.init_done fi EOF # Install basic development tools apt-get update apt-get install -y curl wget git vim nano htop tree sudo echo "Workspace setup completed!" EOT # These environment variables allow you to make Git commits right away after creating a # workspace. Note that they take precedence over configuration defined in ~/.gitconfig! env = { GIT_AUTHOR_NAME = coalesce(data.coder_workspace_owner.me.full_name, data.coder_workspace_owner.me.name) GIT_AUTHOR_EMAIL = "${data.coder_workspace_owner.me.email}" GIT_COMMITTER_NAME = coalesce(data.coder_workspace_owner.me.full_name, data.coder_workspace_owner.me.name) GIT_COMMITTER_EMAIL = "${data.coder_workspace_owner.me.email}" # GPU相关环境变量 NVIDIA_VISIBLE_DEVICES = var.gpu_enabled ? "all" : "" CUDA_VISIBLE_DEVICES = var.gpu_enabled ? "all" : "" } # The following metadata blocks are optional. They are used to display # information about your workspace in the dashboard. metadata { display_name = "CPU Usage" key = "0_cpu_usage" script = "coder stat cpu" interval = 10 timeout = 1 } metadata { display_name = "RAM Usage" key = "1_ram_usage" script = "coder stat mem" interval = 10 timeout = 1 } metadata { display_name = "GPU Usage" key = "2_gpu_usage" script = <<EOT if command -v nvidia-smi &> /dev/null; then nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits | head -1 | xargs printf "%s%%" else echo "No GPU" fi EOT interval = 10 timeout = 1 } metadata { display_name = "GPU Memory" key = "3_gpu_memory" script = <<EOT if command -v nvidia-smi &> /dev/null; then nvidia-smi --query-gpu=memory.used,memory.total --format=csv,noheader,nounits | head -1 | awk '{printf "%.1f/%.1f GB", $1/1024, $2/1024}' else echo "No GPU" fi EOT interval = 10 timeout = 1 } metadata { display_name = "Home Disk" key = "4_home_disk" script = "coder stat disk --path $${HOME}" interval = 60 timeout = 1 } metadata { display_name = "CPU Usage (Host)" key = "5_cpu_usage_host" script = "coder stat cpu --host" interval = 10 timeout = 1 } metadata { display_name = "Memory Usage (Host)" key = "6_mem_usage_host" script = "coder stat mem --host" interval = 10 timeout = 1 } metadata { display_name = "Load Average (Host)" key = "7_load_host" script = <<EOT echo "`cat /proc/loadavg | awk '{ print $1 }'` `nproc`" | awk '{ printf "%0.2f", $1/$2 }' EOT interval = 60 timeout = 1 } metadata { display_name = "Swap Usage (Host)" key = "8_swap_host" script = <<EOT free -b | awk '/^Swap/ { printf("%.1f/%.1f", $3/1024.0/1024.0/1024.0, $2/1024.0/1024.0/1024.0) }' EOT interval = 10 timeout = 1 }}# See https://registry.coder.com/modules/coder/code-servermodule "code-server" { count = data.coder_workspace.me.start_count source = "registry.coder.com/modules/code-server/coder" version = "~> 1.0" agent_id = coder_agent.main.id order = 1}# See https://registry.coder.com/modules/coder/jetbrains-gatewaymodule "jetbrains_gateway" { count = data.coder_workspace.me.start_count source = "registry.coder.com/modules/jetbrains-gateway/coder" version = "~> 1.0" # JetBrains IDEs to make available for the user to select jetbrains_ides = ["IU", "PS", "WS", "PY", "CL", "GO", "RM", "RD", "RR"] default = "PY" # 默认使用PyCharm Professional,适合GPU开发 # Default folder to open when starting a JetBrains IDE folder = "/home/coder" agent_id = coder_agent.main.id agent_name = "main" order = 2}resource "docker_volume" "home_volume" { name = "coder-${data.coder_workspace.me.id}-home" # Protect the volume from being deleted due to changes in attributes. lifecycle { ignore_changes = all } # Add labels in Docker to keep track of orphan resources. labels { label = "coder.owner" value = data.coder_workspace_owner.me.name } labels { label = "coder.owner_id" value = data.coder_workspace_owner.me.id } labels { label = "coder.workspace_id" value = data.coder_workspace.me.id } labels { label = "coder.workspace_name_at_creation" value = data.coder_workspace.me.name }}resource "docker_container" "workspace" { count = data.coder_workspace.me.start_count # 使用支持GPU的PyTorch镜像 image = "pytorch/manylinux-cuda118:latest" # Uses lower() to avoid Docker restriction on container names. name = "coder-${data.coder_workspace_owner.me.name}-${lower(data.coder_workspace.me.name)}" # Hostname makes the shell more user friendly: coder@my-workspace:~$ hostname = data.coder_workspace.me.name # Use the docker gateway if the access URL is 127.0.0.1 entrypoint = ["sh", "-c", replace(coder_agent.main.init_script, "/localhost|127\\.0\\.0\\.1/", "host.docker.internal")] env = [ "CODER_AGENT_TOKEN=${coder_agent.main.token}", "NVIDIA_VISIBLE_DEVICES=${var.gpu_enabled ? var.gpu_count : ""}", "CUDA_VISIBLE_DEVICES=${var.gpu_enabled ? var.gpu_count : ""}", "NVIDIA_DRIVER_CAPABILITIES=compute,utility" ] # GPU配置 runtime = var.gpu_enabled ? "nvidia" : null # 如果启用GPU,配置GPU访问 dynamic "device_requests" { for_each = var.gpu_enabled ? [1] : [] content { driver = "nvidia" count = var.gpu_count == "all" ? -1 : null device_ids = var.gpu_count != "all" ? split(",", var.gpu_count) : null capabilities = [["gpu"]] } } host { host = "host.docker.internal" ip = "host-gateway" } volumes { container_path = "/home/coder" volume_name = docker_volume.home_volume.name read_only = false } # 添加共享内存大小,对于深度学习很有用 shm_size = 2048 # Add labels in Docker to keep track of orphan resources. labels { label = "coder.owner" value = data.coder_workspace_owner.me.name } labels { label = "coder.owner_id" value = data.coder_workspace_owner.me.id } labels { label = "coder.workspace_id" value = data.coder_workspace.me.id } labels { label = "coder.workspace_name" value = data.coder_workspace.me.name }}# 输出GPU状态信息resource "coder_metadata" "workspace_info" { count = data.coder_workspace.me.start_count resource_id = docker_container.workspace[0].id item { key = "image" value = docker_container.workspace[0].image } item { key = "gpu_enabled" value = var.gpu_enabled } item { key = "gpu_config" value = var.gpu_enabled ? var.gpu_count : "disabled" }}
|