AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1
Contents
AI Inference Orchestration with Kubernetes on Instinct MI300X, Part 1#

As organizations scale their AI inference workloads, they face the challenge of efficiently deploying and managing large language models across GPU infrastructure. This three-part blog series provides a production-ready foundation for orchestrating AI inference workloads on the AMD Instinct platform with Kubernetes.
In this post, we’ll establish the essential infrastructure by setting up a Kubernetes cluster using MicroK8s, configuring Helm for streamlined deployments, implementing persistent storage for model caching, and installing the AMD GPU Operator to seamlessly integrate AMD hardware with Kubernetes.
Part 2 will focus on deploying and scaling the vLLM inference engine, implementingMetalLB for load balancing, and optimizing multi-GPU deployments to maximize the performance of AMD Instinct accelerators.
The series concludes inPart 3 with implementing Prometheus for metrics collection, Grafana for performance visualization, andOpen WebUI for interacting with models deployed with vLLM.
Let’s begin with Part 1, where we’ll build the foundational components needed for a production-ready AI inference platform.
MI300X Test System Specifications#
Node Type | Supermicro AS -8125GS-TNMR2 |
CPU | 2x AMD EPYC 9654 96-Core Processor |
Memory | 24x 96GB 2R ddr5-4800 dual rank |
GPU | 8x MI300X |
OS | Ubuntu 22.04.4 LTS |
Kernel | 6.5.0-45-generic |
ROCm | 6.2.0 |
AMD GPU driver | 6.7.0-1787201.22.04 |
Install Kubernetes (microk8s)#
Microk8s is a lightweight, but powerful kubernetes instance that can run on as little as a single node and can scale up to mid-level cluster sizes. Here we run the snap install to load microk8s on the host node.
sudoaptupdatesudoaptinstallsnapdsudosnapinstallmicrok8s--classic
Add the current user to the microk8s group and create the.kube directory in your home directory to store the microk8s config file.
sudousermod-a-Gmicrok8s$USERmkdir-p~/.kubechmod0700~/.kube
Register themicrok8s group you just added in your current shell
newgrpmicrok8s
Note
For convenience you can add the following alias to your bashrc to use the native “kubectl” command with microk8secho"aliaskubectl='microk8skubectl'">>~/.bashrc;source~/.bashrc
Let’s confirm our cluster is up and running.
kubectlgetnodes
We should see theSTATUS asReady
NAMESTATUSROLESAGEVERSIONmi300x-server01Ready<none>5mv1.31.3
Since this is a single-node instance of Kubernetes, we will need to label our node as a control-plane node in order for the AMD GPU Operator to successfully find the node.
Note
For vanilla Kubernetes installations this is not required, but if you’re running Kubernetes on a single master node you will need toremove the taint to be able to schedule jobs on this node
We can do this in one line
kubectllabelnode$(kubectlgetnodes--no-headers|grep"<none>"|awk'{print $1}')node-role.kubernetes.io/control-plane=''
Let’s confirm the node has been labeled correctly
kubectlgetnodes
We should see theROLES ascontrol-plane
NAMESTATUSROLESAGEVERSIONmi300x-server01Readycontrol-plane8mv1.31.3
Install Helm#
TheAMD GPU Operator, Grafana, and Prometheus installations are faciliated by using Helm charts. To install the latest version of Helm we will download the install script from the Helm repository and run it.
curl-fsSL-oget_helm.shhttps://raw.githubusercontent.com/helm/helm/main/scripts/get-helm-3chmod700get_helm.sh./get_helm.shIf you are utilizing microk8s as your Kubernetes instance, you also need to tell Helm where theKUBECONFIG environment variable is for microk8s. This is accomplished by exporting the environment variable into the.bashrc of the current user.
echo"export KUBECONFIG=/var/snap/microk8s/current/credentials/client.config">>~/.bashrc;source~/.bashrc
Note
For vanilla Kubernetes installations this is not required unless you also wish to change the default location$HOME/.kube/config
Persistent Storage#
Next, we enable persistent storage so that we don’t have to keep downloading the model when starting or scaling up an instance of the vLLM inference server. Here are methods for both microk8s and vanilla Kubernetes.
Microk8s#
Enable storage on microk8s with:
microk8senablestorageCreate a persistent volume claim (PVC) usingvllm-pvc.yaml:
apiVersion:v1kind:PersistentVolumeClaim# Defines a PersistentVolumeClaim (PVC) resourcemetadata:name:llama-3.2-1b# Name of the PVC, used for reference in deploymentsnamespace:default# Kubernetes namespace where this PVC residesspec:accessModes:-ReadWriteOnce# Access mode indicating the volume can be mounted as read-write by a single noderesources:requests:storage:50Gi# Specifies the amount of storage requested for this volumestorageClassName:microk8s-hostpath# Specifies the storage class to use (e.g., MicroK8s hostPath)volumeMode:Filesystem# Indicates that the volume will be formatted as a filesystem
Apply the persistent volume claim in the console with:
kubectlapply-fvllm-pvc.yaml
Note
For microk8s, hostpath provisioner will store all volume data in/var/snap/microk8s/common/default-storage on the host machine.
Vanilla Kubernetes#
In environments without dynamic storage provisioning, define aPersistentVolume (PV) that the PVC can bind to withpv.yaml:
apiVersion:v1kind:PersistentVolumemetadata:name:llama-3.2-1b-pv# Name of the PersistentVolumenamespace:defaultspec:capacity:storage:50Gi# Amount of storage available in this PVaccessModes:-ReadWriteOnce# Access mode for the PVhostPath:path:/mnt/data/llama# Path on the host where the data is storedvolumeMode:Filesystem# Indicates that the volume will be formatted as a filesystem
Define aPersistentVolumeClaim (PVC) withvllm-pvc.yaml:
apiVersion:v1kind:PersistentVolumeClaimmetadata:name:llama-3.2-1bnamespace:defaultspec:accessModes:-ReadWriteOnce# Access mode indicating the volume can be mounted as read-write by a single noderesources:requests:storage:50Gi# Amount of storage requestedvolumeMode:Filesystem# Indicates that the volume will be formatted as a filesystem
Note
The PVC will bind to the specific PV, based on matchingstorage andaccessModes attributes.
Apply the two manifests:
kubectlapply-fpv.yamlkubectlapply-fvllm-pvc.yaml
Install the AMD GPU Operator#
We proceed with installing theAMD GPU Operator. First we need to installcert-manager as a prerequisite by adding the Jetstack repo to Helm.
helmrepoaddjetstackhttps://charts.jetstack.io
Then run the install forcert-manager via Helm.
helminstallcert-managerjetstack/cert-manager\--namespacecert-manager\--create-namespace\--versionv1.15.1\--setcrds.enabled=true
Finally, install the AMD GPU Operator.
# Add the Helm repositoryhelmrepoaddrocmhttps://rocm.github.io/gpu-operatorhelmrepoupdate# Install the GPU Operatorhelminstallamd-gpu-operatorrocm/gpu-operator-charts\--namespacekube-amd-gpu--create-namespace
Configure AMD GPU Operator#
For our workload we will use a defaultdevice-config.yaml file to configure the AMD GPU Operator. This sets up the device plugin, node labeler, and metrics exporter to properly assign AMD GPUs to workloads, label the nodes that have AMD GPUs, and export metrics for Grafana and Prometheus. For a full list of configurable options refer to theFull Reference Config page of the AMD GPU Operator documentation.
apiVersion:amd.com/v1alpha1kind:DeviceConfigmetadata:name:gpu-operator# use the namespace where AMD GPU Operator is runningnamespace:kube-amd-gpuspec:driver:# disable the installation of out-of-tree amdgpu kernel moduleenable:falsedevicePlugin:# Specify the device plugin image# default value is rocm/k8s-device-plugin:latestdevicePluginImage:rocm/k8s-device-plugin:latest# Specify the node labeller image# default value is rocm/k8s-device-plugin:labeller-latestnodeLabellerImage:rocm/k8s-device-plugin:labeller-latest# Specify to enable/disable the node labeller# node labeller is required for adding / removing blacklist config of amdgpu kernel module# please set to true if you want to blacklist the inbox driver and use our-of-tree driverenableNodeLabeller:truemetricsExporter:# To enable/disable the metrics exporter, disabled by defaultenable:true# kubernetes service type for metrics exporter, clusterIP(default) or NodePortserviceType:"NodePort"# internal service port used for in-cluster and node access to pull metrics from the metrics-exporter (default 5000)port:5000# Node port for metrics exporter service, metrics endpoint $node-ip:$nodePortnodePort:32500# exporter imageimage:"docker.io/rocm/device-metrics-exporter:v1.0.0"# Specify the node to be managed by this DeviceConfig Custom Resourceselector:feature.node.kubernetes.io/amd-gpu:"true"
We apply the device-config file as:
kubectlapply-fdevice-config.yaml
To confirm the node labeler is working we run
kubectlgetnodes-Lfeature.node.kubernetes.io/amd-gpu
and we should seeAMD-GPU set astrue
NAMESTATUSROLESAGEVERSIONAMD-GPUmi300x-server01Readycontrol-plane15mv1.31.3trueTo show available GPUs for workloads we can use:
kubectlgetnodes-ocustom-columns=NAME:.metadata.name,"Total GPUs:.status.capacity.amd\.com/gpu","Allocatable GPUs:.status.allocatable.amd\.com/gpu"
Now, we can see the total available GPUs
NAMETotalGPUsAllocatableGPUsmi300x-server0188
Summary#
In this post, we’ve established a solid foundation for AI inference workloads by:
Setting up a Kubernetes cluster with MicroK8s
Configuring essential components like Helm
Implementing persistent storage for model management
Installing and validating the AMD GPU Operator
The next installment in this series will walk through deploying and scaling vLLM for inference, implementing MetalLB for load balancing, and optimizing multi-GPU deployments on AMD Instinct hardware. Part 3 will round out the series by deploying Open WebUI as a front end and configuring monitoring and management with Prometheus and Grafana. Stay tuned!
Disclaimers
Third-party content is licensed to you directly by the third party that owns the content and is not licensed to you by AMD. ALL LINKED THIRD-PARTY CONTENT IS PROVIDED “AS IS” WITHOUT A WARRANTY OF ANY KIND. USE OF SUCH THIRD-PARTY CONTENT IS DONE AT YOUR SOLE DISCRETION AND UNDER NO CIRCUMSTANCES WILL AMD BE LIABLE TO YOU FOR ANY THIRD-PARTY CONTENT. YOU ASSUME ALL RISK AND ARE SOLELY RESPONSIBLE FOR ANY DAMAGES THAT MAY ARISE FROM YOUR USE OF THIRD-PARTY CONTENT.