Time-Slicing GPUs with Karpenter

#git

Time-Slicing GPUs with Karpenter

Arthor: Ran Tao, Cloud architect @Jina AI
This article is originally published onJina AI News.

Today, businesses and developers are keen to use cloud for deep learning. Especially with the GPU cloud instances, you pay as you go. It is much more cost-efficient comparing to having an expensive metal machine in the office. But let's switch the role now. Say you are the GPU cloud provider, and you provide the GPU environment for hosting other users applications. The problem now becomes, how can you, as this platform provider, lower down the GPU costs to maximize the profit? This is not abou

But let's switch the role now. Say you are the GPU cloud provider, and you provide the GPU environment for hosting other users applications. The problem now becomes,how can you, as this platform provider, lower down the GPU costs to maximize the profit?

This is not about finding the cheapest GPU vendors. In fact, it isthe question we were facing at Jina AI when designing our GPU cloud platform.

Jina AI Cloud Hosting After building a Jina project, the next step is to deploy and host it on the cloud. Jina AI Cloud is Jina’s reliable, scalable and production-ready cloud-hosting solution that manages your project lifecycle without surprises or hidden development costs.

The answer istime-slicing.

💡Time-slicing allows oversubscription of GPUs. Under the hood, CUDA time-slicing is used to allow workloads that land on oversubscribed GPUs to interleave with one another. Each workload has access to the GPU memory and runs in the same fault-domain as of all the others

In this article, we will use Karpenter - an elastic node scaling method in Kubernetes and NVIDIA’s k8s plugin to achieve time-slicing on GPUs. A GPU cloud with time-slicing will allow users to share GPUs between pods, hence saves the costs.

[Karpenter](https://karpenter.sh)

[K8s-device-plugin](https://github.com/NVIDIA/k8s-device-plugin)

Karpenter itself provides an auto scaling feature to nodes, which means that you will have the GPU instance only when you need it and can schedule the node based on the instance type you configured. It saves you money and schedules nodes more effectively.

The purpose of utilizing the GPU with Karpenter is not only saving cost, but more importantly, it also provides us a flexible method to schedule GPU resources to our applications within the kubernetes cluster. You may own tens of applications which need the GPU in different time slots, how to schedule them in a more cost effective way is so important in the cloud.

Architecture

Infrastructure diagram

Component diagram

It’s pretty straightforward: the application will choose a karpenter provisioner with a selector. The karpenter provisioner will create nodes based on the launch template in that provisioner.

Deployment

Building the architect is simple, the problem we are left with is how we are going to deploy it. There are some particulars we need to think about.

How we deploy the nvidia k8s plugin to the nodes with GPU only.
How we configure the shared GPU nodes to use time-slicing without affecting others.
How do we automatically update nodes AMI in the launch template so the nodes can use the latest image.
How do we setup karpenter provisioners

Let’s do it one by one then.

First, install karpenter and setup provisioner with terraform. You can manually install karpenter in eks with an official document as well. If you already have eks with karpenter, you can skip it.

[tarrantrom](https://github.com/tarrantro/terraform)

Set provisioner

The Provisioners is set to use corelated launch templates to provision GPU nodes with labels and taints.

resource "kubectl_manifest" "karpenter_provisioner_gpu_shared" {  yaml_body = <<-YAML  apiVersion: karpenter.sh/v1alpha5  kind: Provisioner  metadata:    name: gpu-shared  spec:    ttlSecondsAfterEmpty: 300    labels:      jina.ai/node-type: gpu-shared      jina.ai/gpu-type: nvidia      nvidia.com/device-plugin.config: shared_gpu    requirements:      - key: node.kubernetes.io/instance-type        operator: In        values: ["g4dn.xlarge", "g4dn.2xlarge", "g4dn.4xlarge", "g4dn.12xlarge"]      - key: karpenter.sh/capacity-type        operator: In        values: ["spot", "on-demand"]      - key: kubernetes.io/arch        operator: In        values: ["amd64"]    taints:      - key: nvidia.com/gpu-shared        effect: "NoSchedule"    limits:      resources:        cpu: 1000    provider:      launchTemplate: "karpenter-gpu-shared-${local.cluster_name}"      subnetSelector:        karpenter.sh/discovery: ${local.cluster_name}      tags:        karpenter.sh/discovery: ${local.cluster_name}    ttlSecondsAfterEmpty: 30  YAML  depends_on = [    helm_release.karpenter  ]}resource "kubectl_manifest" "karpenter_provisioner_gpu" {  yaml_body = <<-YAML  apiVersion: karpenter.sh/v1alpha5  kind: Provisioner  metadata:    name: gpu  spec:    ttlSecondsAfterEmpty: 300    labels:      jina.ai/node-type: gpu      jina.ai/gpu-type: nvidia    requirements:      - key: node.kubernetes.io/instance-type        operator: In        values: ["g4dn.xlarge", "g4dn.2xlarge", "g4dn.4xlarge", "g4dn.12xlarge"]      - key: karpenter.sh/capacity-type        operator: In        values: ["spot", "on-demand"]      - key: kubernetes.io/arch        operator: In        values: ["amd64"]    taints:      - key: nvidia.com/gpu        effect: "NoSchedule"    limits:      resources:        cpu: 1000    provider:      launchTemplate: "karpenter-gpu-${local.cluster_name}"      subnetSelector:        karpenter.sh/discovery: ${local.cluster_name}      tags:        karpenter.sh/discovery: ${local.cluster_name}    ttlSecondsAfterEmpty: 30  YAML  depends_on = [    helm_release.karpenter  ]}

Provisioner

Launch template (only GPU):gpu_launchtemplate.hcl

Add time-slicing config

Secondly, we need to deploy the NVIDIA k8s plugin with time-slicing config and default config and set up a node selector so the daemonset will only run on the GPU instances.

config:  # ConfigMap name if pulling from an external ConfigMap  name: ""  # Set of named configs to build an integrated ConfigMap from  map:     default: |-      version: v1      flags:        migStrategy: "none"        failOnInitError: true        nvidiaDriverRoot: "/"        plugin:          passDeviceSpecs: false          deviceListStrategy: envvar          deviceIDStrategy: uuid    shared_gpu: |-      version: v1      flags:        migStrategy: "none"        failOnInitError: true        nvidiaDriverRoot: "/"        plugin:          passDeviceSpecs: false          deviceListStrategy: envvar          deviceIDStrategy: uuid      sharing:        timeSlicing:          renameByDefault: false          resources:          - name: nvidia.com/gpu            replicas: 10nodeSelector:   jina.ai/gpu-type: nvidia

nvdp.yaml

Run the below command to install NVIDIA’s k8s plugin:

helm repo add nvdp https://nvidia.github.io/k8s-device-pluginhelm repo updatehelm upgrade -i nvdp nvdp/nvidia-device-plugin \  --namespace nvidia-device-plugin \  --create-namespace -f nvdp.yaml

Deploy user application

Third, deploy the user application withnodeSelector andtoleration.

kind: DeploymentapiVersion: apps/v1metadata:  name: test-gpu  labels:    app: gpuspec:  replicas: 1  selector:    matchLabels:      app: gpu  template:    metadata:      labels:        app: gpu    spec:      nodeSelector:        jina.ai/node-type: gpu        karpenter.sh/provisioner-name: gpu      tolerations:      - key: nvidia.com/gpu        operator: Exists        effect: NoSchedule      containers:      - name: gpu-container        image: tensorflow/tensorflow:latest-gpu        imagePullPolicy: Always        command: ["python"]        args: ["-u", "-c", "import tensorflow"]        resources:          limits:            nvidia.com/gpu: 1

gpu.yml

kind: DeploymentapiVersion: apps/v1metadata:  name: test-gpu-shared  labels:    app: gpu-sharedspec:  replicas: 1  selector:    matchLabels:      app: gpu-shared  template:    metadata:      labels:        app: gpu-shared    spec:      nodeSelector:        jina.ai/node-type: gpu-shared        karpenter.sh/provisioner-name: gpu-shared      tolerations:      - key: nvidia.com/gpu-shared        operator: Exists        effect: NoSchedule      containers:      - name: gpu-container        image: tensorflow/tensorflow:latest-gpu        imagePullPolicy: Always        command: ["python"]        args: ["-u", "-c", "import tensorflow"]        resources:          limits:            nvidia.com/gpu: 1

gpu-shared.yml

Validate the results

Now, if you deploy both YAML files. You will see two nodes provisioned in AWS console or you can see via usekubectl get nodes — show-labels. After thenvidia-k8s-plugin is running in each nodes, you can test in your applications.

The result showing in the AWS EC2 console

If you like this article or want to learn more about the architecture behind Jina AI Cloud, make sure tofollow us on social channels andsubscribe to our blog.