Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings

How to create a GPU-enabled development environment with Coder using Docker provider#18722

Unanswered
iomgaa-ycz asked this question inGeneral
Discussion options

Issue Title

How to create a GPU-enabled development environment with Coder using Docker provider

Description

I'm trying to create a Coder template that provides a GPU-accelerated development environment for machine learning work. I want to build a custom Docker image with CUDA support and have it accessible through Coder's web interface.

Environment

  • OS: Ubuntu 20.04
  • GPU: NVIDIA RTX 4090
  • Docker: GPU support verified working
  • Coder: 2.24
  • Terraform: 1.12.2

GPU Environment Verification

I've confirmed that my Docker + GPU setup is working correctly:

docker run --gpus all --rm pytorch/manylinux-cuda118:latest nvidia-smi

This command successfully shows GPU information, confirming that:

  • NVIDIA Docker runtime is properly configured
  • GPU passthrough to containers works
  • CUDA drivers are accessible from within containers
  1. Create a Coder template that builds a custom Docker image with:

    • NVIDIA CUDA 11.8 support
    • Python development environment with PyTorch, Jupyter Lab, etc.
    • GPU monitoring and resource tracking
    • VS Code and JetBrains IDE integration
  2. The workspace should:

    • Have GPU access (nvidia-smi should work inside the workspace)
    • Provide web access to VS Code

Current Issues

I'm encountering several challenges:

  1. When I instantiated the template I built into a workspace, I found that the instance did not have a GPU.

Are there any existing examples or community templates for GPU-enabled Coder workspaces that I could reference?

Any help, examples, or guidance would be greatly appreciated!

terraform {  required_providers {    coder = {      source = "coder/coder"    }    docker = {      source = "kreuzwerker/docker"    }  }}locals {  username = data.coder_workspace_owner.me.name}variable "docker_socket" {  default     = ""  description = "(Optional) Docker socket URI"  type        = string}variable "gpu_enabled" {  default     = true  description = "Enable GPU support for the workspace"  type        = bool}variable "gpu_count" {  default     = "all"  description = "Number of GPUs to allocate (use 'all' for all GPUs, or specify device IDs like '0,1')"  type        = string}provider "docker" {  # Defaulting to null if the variable is an empty string lets us have an optional variable without having to set our own default  host = var.docker_socket != "" ? var.docker_socket : null}data "coder_provisioner" "me" {}data "coder_workspace" "me" {}data "coder_workspace_owner" "me" {}resource "coder_agent" "main" {  arch           = data.coder_provisioner.me.arch  os             = "linux"  startup_script = <<-EOT    set -e    # Create coder user if it doesn't exist    if ! id "coder" &>/dev/null; then        useradd --create-home --shell=/bin/bash --groups=sudo coder        echo "coder ALL=(ALL) NOPASSWD:ALL" >> /etc/sudoers.d/90-coder    fi    # Ensure coder user owns the home directory    chown -R coder:coder /home/coder    # Switch to coder user for the rest of the setup    sudo -u coder bash << 'EOF'    # Prepare user home with default files on first start.    if [ ! -f ~/.init_done ]; then      # Create basic shell configuration      echo 'export PATH=$PATH:/usr/local/bin' >> ~/.bashrc      echo 'alias ll="ls -la"' >> ~/.bashrc            # Check GPU availability      if command -v nvidia-smi &> /dev/null; then        echo "GPU detected:"        nvidia-smi        echo 'export CUDA_VISIBLE_DEVICES=all' >> ~/.bashrc      else        echo "No GPU detected or nvidia-smi not available"      fi            # Install basic Python packages      if command -v pip &> /dev/null; then        pip install --user jupyter notebook ipython      fi            touch ~/.init_done    fi    EOF    # Install basic development tools    apt-get update    apt-get install -y curl wget git vim nano htop tree sudo    echo "Workspace setup completed!"  EOT  # These environment variables allow you to make Git commits right away after creating a  # workspace. Note that they take precedence over configuration defined in ~/.gitconfig!  env = {    GIT_AUTHOR_NAME     = coalesce(data.coder_workspace_owner.me.full_name, data.coder_workspace_owner.me.name)    GIT_AUTHOR_EMAIL    = "${data.coder_workspace_owner.me.email}"    GIT_COMMITTER_NAME  = coalesce(data.coder_workspace_owner.me.full_name, data.coder_workspace_owner.me.name)    GIT_COMMITTER_EMAIL = "${data.coder_workspace_owner.me.email}"    # GPU相关环境变量    NVIDIA_VISIBLE_DEVICES = var.gpu_enabled ? "all" : ""    CUDA_VISIBLE_DEVICES   = var.gpu_enabled ? "all" : ""  }  # The following metadata blocks are optional. They are used to display  # information about your workspace in the dashboard.  metadata {    display_name = "CPU Usage"    key          = "0_cpu_usage"    script       = "coder stat cpu"    interval     = 10    timeout      = 1  }  metadata {    display_name = "RAM Usage"    key          = "1_ram_usage"    script       = "coder stat mem"    interval     = 10    timeout      = 1  }  metadata {    display_name = "GPU Usage"    key          = "2_gpu_usage"    script       = <<EOT      if command -v nvidia-smi &> /dev/null; then        nvidia-smi --query-gpu=utilization.gpu --format=csv,noheader,nounits | head -1 | xargs printf "%s%%"      else        echo "No GPU"      fi    EOT    interval     = 10    timeout      = 1  }  metadata {    display_name = "GPU Memory"    key          = "3_gpu_memory"    script       = <<EOT      if command -v nvidia-smi &> /dev/null; then        nvidia-smi --query-gpu=memory.used,memory.total --format=csv,noheader,nounits | head -1 | awk '{printf "%.1f/%.1f GB", $1/1024, $2/1024}'      else        echo "No GPU"      fi    EOT    interval     = 10    timeout      = 1  }  metadata {    display_name = "Home Disk"    key          = "4_home_disk"    script       = "coder stat disk --path $${HOME}"    interval     = 60    timeout      = 1  }  metadata {    display_name = "CPU Usage (Host)"    key          = "5_cpu_usage_host"    script       = "coder stat cpu --host"    interval     = 10    timeout      = 1  }  metadata {    display_name = "Memory Usage (Host)"    key          = "6_mem_usage_host"    script       = "coder stat mem --host"    interval     = 10    timeout      = 1  }  metadata {    display_name = "Load Average (Host)"    key          = "7_load_host"    script   = <<EOT      echo "`cat /proc/loadavg | awk '{ print $1 }'` `nproc`" | awk '{ printf "%0.2f", $1/$2 }'    EOT    interval = 60    timeout  = 1  }  metadata {    display_name = "Swap Usage (Host)"    key          = "8_swap_host"    script       = <<EOT      free -b | awk '/^Swap/ { printf("%.1f/%.1f", $3/1024.0/1024.0/1024.0, $2/1024.0/1024.0/1024.0) }'    EOT    interval     = 10    timeout      = 1  }}# See https://registry.coder.com/modules/coder/code-servermodule "code-server" {  count  = data.coder_workspace.me.start_count  source = "registry.coder.com/modules/code-server/coder"  version = "~> 1.0"  agent_id = coder_agent.main.id  order    = 1}# See https://registry.coder.com/modules/coder/jetbrains-gatewaymodule "jetbrains_gateway" {  count  = data.coder_workspace.me.start_count  source = "registry.coder.com/modules/jetbrains-gateway/coder"  version = "~> 1.0"  # JetBrains IDEs to make available for the user to select  jetbrains_ides = ["IU", "PS", "WS", "PY", "CL", "GO", "RM", "RD", "RR"]  default        = "PY"  # 默认使用PyCharm Professional,适合GPU开发  # Default folder to open when starting a JetBrains IDE  folder = "/home/coder"  agent_id   = coder_agent.main.id  agent_name = "main"  order      = 2}resource "docker_volume" "home_volume" {  name = "coder-${data.coder_workspace.me.id}-home"  # Protect the volume from being deleted due to changes in attributes.  lifecycle {    ignore_changes = all  }  # Add labels in Docker to keep track of orphan resources.  labels {    label = "coder.owner"    value = data.coder_workspace_owner.me.name  }  labels {    label = "coder.owner_id"    value = data.coder_workspace_owner.me.id  }  labels {    label = "coder.workspace_id"    value = data.coder_workspace.me.id  }  labels {    label = "coder.workspace_name_at_creation"    value = data.coder_workspace.me.name  }}resource "docker_container" "workspace" {  count = data.coder_workspace.me.start_count    # 使用支持GPU的PyTorch镜像  image = "pytorch/manylinux-cuda118:latest"    # Uses lower() to avoid Docker restriction on container names.  name = "coder-${data.coder_workspace_owner.me.name}-${lower(data.coder_workspace.me.name)}"    # Hostname makes the shell more user friendly: coder@my-workspace:~$  hostname = data.coder_workspace.me.name    # Use the docker gateway if the access URL is 127.0.0.1  entrypoint = ["sh", "-c", replace(coder_agent.main.init_script, "/localhost|127\\.0\\.0\\.1/", "host.docker.internal")]    env = [    "CODER_AGENT_TOKEN=${coder_agent.main.token}",    "NVIDIA_VISIBLE_DEVICES=${var.gpu_enabled ? var.gpu_count : ""}",    "CUDA_VISIBLE_DEVICES=${var.gpu_enabled ? var.gpu_count : ""}",    "NVIDIA_DRIVER_CAPABILITIES=compute,utility"  ]  # GPU配置  runtime = var.gpu_enabled ? "nvidia" : null    # 如果启用GPU,配置GPU访问  dynamic "device_requests" {    for_each = var.gpu_enabled ? [1] : []    content {      driver       = "nvidia"      count        = var.gpu_count == "all" ? -1 : null      device_ids   = var.gpu_count != "all" ? split(",", var.gpu_count) : null      capabilities = [["gpu"]]    }  }  host {    host = "host.docker.internal"    ip   = "host-gateway"  }    volumes {    container_path = "/home/coder"    volume_name    = docker_volume.home_volume.name    read_only      = false  }  # 添加共享内存大小,对于深度学习很有用  shm_size = 2048  # Add labels in Docker to keep track of orphan resources.  labels {    label = "coder.owner"    value = data.coder_workspace_owner.me.name  }  labels {    label = "coder.owner_id"    value = data.coder_workspace_owner.me.id  }  labels {    label = "coder.workspace_id"    value = data.coder_workspace.me.id  }  labels {    label = "coder.workspace_name"    value = data.coder_workspace.me.name  }}# 输出GPU状态信息resource "coder_metadata" "workspace_info" {  count       = data.coder_workspace.me.start_count  resource_id = docker_container.workspace[0].id  item {    key   = "image"    value = docker_container.workspace[0].image  }    item {    key   = "gpu_enabled"    value = var.gpu_enabled  }    item {    key   = "gpu_config"    value = var.gpu_enabled ? var.gpu_count : "disabled"  }}
You must be logged in to vote

Replies: 1 comment

Comment options

I have a few that I am not actively maintaining athttps://github.com/matifali/coder-templates. Let me know if they work for you.

You must be logged in to vote
0 replies
Sign up for freeto join this conversation on GitHub. Already have an account?Sign in to comment
Labels
None yet
2 participants
@iomgaa-ycz@matifali

[8]ページ先頭

©2009-2025 Movatter.jp