Maximize GPU network bandwidth in Standard mode clusters Stay organized with collections Save and categorize content based on your preferences.
This page isintended for machine learning (ML) engineersand platform administrators who facilitate ML workloads. To learn more aboutcommon roles and example tasks that we reference in Google Cloud content, seeCommon GKE user roles and tasks.
Artificial intelligence (AI), ML, and high performancecomputing (HPC) applications require powerful acceleration to optimizeperformance by reducing job completion times. For example, ML models that focuson conversational AI and image generation require high scalability and computepower.
Before reading this page, ensure that you're familiar with networkingtechnologies, such as network interface cards(NICs) and TCP, and with accelerator technologies like the NVIDIA CollectiveCommunications Library (NCCL).
About Google Cloud GPU supercomputers
Google Cloud has accelerator-optimized supercomputers that are built forscalable, massive models. These machines have the following benefits:
- Eight NVIDIA B200, H200, or H100 GPUs per machine.
- Up to 200 Gbps bandwidth on the primary NIC.
- Secondary NICs (up to eight on A3 Mega machine types and up to four on A3High machine types), each supporting up to 200 Gbps bandwidth for GPUdata transfer. On A3 High machine types, the expected bandwidth per NIC isapproximately 150 Gbps.
Your GKE workloadmust use all available GPUs and all availablesecondary NICs on a single node and use a significant portion of theavailable bandwidth. The solution described in this document is ideal forworkloads that require high performance, high throughput, and low latency.
Required features and capabilities for maximized bandwidth
To maximize your network bandwidth in GPU supercomputer nodes, useall ofthe following features:
- GPUDirect networking stack: The A3 machine series supportsthree networking stacks for custom, remote direct memory access (RDMA):
- On A3 High machine types and NVIDIA H100 GPUs, utilizeGPUDirect-TCPXto reduce the overhead required to transfer packet payloads to and fromGPUs, which significantly improves throughput at scale compared to GPUs thatdon't use GPUDirect.
- On A3 Mega machine types and NVIDIA H100 Mega GPUs, utilizeGPUDirect-TCPXOwhich further improves GPU to VM communication.
- On A3 Ultra machine types and NVIDIA H200 GPUs, and A4 machine types andNVIDIA B200 GPUs, utilizeGPUDirect RDMA to run distributed AIworkloads with further throughput improvements. To get started,create acustom AI-optimized GKEcluster.
- gVNIC: Enable GPUDirect capabilities such as packet header splitting,flow steering, and buffer management. gVNIC is required to useGPUDirect-TCPX or GPUDirect-TCPXO. For details about gVNIC, seeIncrease network traffic speed for GPU nodes.
- Multi-networking: Add secondary NICs tothe accelerator-optimized machine. Each NIC is associated with a separatesubnet in its own VPC to avoid conflicts. For details about multi-network support, seeSetup multi-network support for Pods.
- Placement policies: Use a resource placement policy to place all GPU nodesfor a specific workload on physically close servers to minimize latency.For details, seeDefine compact placement for GKE nodes.
Procedure outline
To use all of these capabilities together, you'll do the following:
- Create Virtual Private Cloud (VPC)s and subnets
- Create the GKE environment.
- Install the GPUDirect binary and the NCCL plugin
- Deploy the NRI device injector plugin
- Deploy a test workload to verify GPUDirect setup
Before you begin
Before you start, make sure that you have performed the following tasks:
- Enable the Google Kubernetes Engine API. Enable Google Kubernetes Engine API
- If you want to use the Google Cloud CLI for this task,install and theninitialize the gcloud CLI. If you previously installed the gcloud CLI, get the latest version by running the
gcloud components updatecommand. Earlier gcloud CLI versions might not support running the commands in this document.Note: For existing gcloud CLI installations, make sure to set thecompute/regionproperty. If you use primarily zonal clusters, set thecompute/zoneinstead. By setting a default location, you can avoid errors in the gcloud CLI like the following:One of [--zone, --region] must be supplied: Please specify location. You might need to specify the location in certain commands if the location of your cluster differs from the default that you set.
- Ensure that you have capacity for A3 Mega or A3 High VMs. To obtain thiscapacity, first, choose from theconsumptionoptions.To follow the instructions on this page, you can use either on-demand capacity,on-demand reservations, Future reservations, orFuture reservations for up to 90 days (in calendar mode). Afteryou've chosen a consumption option, follow the respective instructions toobtain capacity using the consumption option that you've chosen.
- Ensure that you have enough quota for H100 GPUs.To request more quota, seeGPU quotas.
Requirements
The following requirements apply to both GPUDirect-TCPX and GPUDirect-TCPXOunless otherwise indicated.
GPUDirect-TCPX is supported on GKE version 1.27 or later with specific patch versions, and requires:
- The
a3-highgpu-8gmachine type. - For GKE version 1.27, use GKE patch version 1.27.7-gke.1121000 or later.
- For GKE version 1.28, use GKE patch version 1.28.10-gke.1141000 or later.
- For GKE version 1.29, use GKE patch version 1.29.5-gke.1121000 or later.
- For GKE version 1.30 to 1.33, use any patch version.
- Don't use GKE version 1.34 or later. For more information,see theGPUDirect-TCPX is unavailable with A3 High for GKEversion 1.34 andlaterknown issue.
- The
GPUDirect-TCPXO is supported on GKE version 1.28 or later and requires:
- The
a3-megagpu-8gmachine type. - For GKE version 1.28, use GKE patch version 1.28.9-gke.1250000 or later.
- For GKE version 1.29, use GKE patch version 1.29.4-gke.1542000 or later.
- For GKE version 1.30, use GKE patch version 1.30.4-gke.1129000 or later.
- For GKE version 1.31, use GKE patch version 1.31.1-gke.2008000 or later.
- For GKE version 1.32, use GKE patch version 1.32.2-gke.1489001 or later.
- The
The GKE node must use a Container-Optimized OS (COS) nodeimage. Ubuntu and Windows node images are not supported.
- Your GPU nodes must use NVIDIA driver version 535 or later.
- You must use GKE Dataplane V2.
- For GPUDirect-TCPX or GPUDirect-TCPXO workloads that run across multiple nodepools, all of the node pools must be in the same Compute Engine zonesand must use the same network sets, such as VPCs and subnets.
Limitations
The following limitations apply:
- GPUDirect-TCPX and GPUDirect-TCPXO are not supported withmulti-instance GPUs,GPU time-sharing, orNVIDIA MPS.
- You can't useNCCL FastSocket with GPUDirect-TCPX or GPUDirect-TCPXO .
- Your GKE workloadmust use all available GPUs and allavailable secondary NICs on a single node. Multiple pods cannot useGPUDirect-TCPX or GPUDirect-TCPXO on a single node.
- You can only use the
a3-highgpu-8gand thea3-megagpu-8gmachine types.Other A3 machine types aren't supported.
Create VPCs and subnets
Create separate VPC networks in your project for each virtualNIC that you'll add to your nodes. Each VPC network must have a subnetand a firewall rule that allows internal network traffic.
Create the VPC networks for GPUDirect in your project,each with a subnet and a firewall rule. Choose the GPUDirect-TCPX tab for A3High machine types, or choose the GPUDirect-TCPXO tab for A3 Mega machinetypes, then complete the following instructions:
GPUDirect-TCPXO
To maximize your bandwidth, we recommend that you create eight new networks.
forNin$(seq18);dogcloudcomputenetworkscreatePREFIX-net-$N\--subnet-mode=custom\--mtu=8244gcloudcomputenetworkssubnetscreatePREFIX-sub-$N\--network=PREFIX-net-$N\--region=REGION\--range=SUBNET_RANGEgcloudcomputefirewall-rulescreatePREFIX-internal-$N\--network=PREFIX-net-$N\--action=ALLOW\--rules=tcp:0-65535,udp:0-65535,icmp\--source-ranges=SOURCE_RANGEdoneReplace the following:
PROJECT_ID: your Google Cloud project ID.REGION: the Compute Engine region for eachsubnet.SUBNET_RANGE: the IP address range of each subnetin CIDR notation. This example command iterates for eight subnets, soyou should use a variable to change the IP address for each subnet.For example, specify192.168.$N.0/24so that the first subnet uses192.168.1.0/24, thesecond subnet uses192.168.2.0/24, and so on.SOURCE_RANGE: The source IP address range for thefirewall rule to allow ingress traffic, in CIDR notation. For example,192.168.0.0/16.
GPUDirect-TCPX
To maximize your bandwidth, we recommend that you create four new networks.
forNin$(seq14);dogcloudcomputenetworkscreatePREFIX-net-$N\--subnet-mode=custom\--mtu=8244gcloudcomputenetworkssubnetscreatePREFIX-sub-$N\--network=PREFIX-net-$N\--region=REGION\--range=SUBNET_RANGEgcloudcomputefirewall-rulescreatePREFIX-internal-$N\--network=PREFIX-net-$N\--action=ALLOW\--rules=tcp:0-65535,udp:0-65535,icmp\--source-ranges=SOURCE_RANGEdoneReplace the following:
PROJECT_ID: your Google Cloud project ID.REGION: the Compute Engine region for eachsubnet.SUBNET_RANGE: the IP address range of each subnetin CIDR notation. This example command iterates for four subnets, soyou should use a variable to change the IP address for each subnet.For example, specify192.168.$N.0/24so that the first subnet uses192.168.1.0/24, thesecond subnet uses192.168.2.0/24, etc.SOURCE_RANGE: The source IP address range for thefirewall rule to allow ingress traffic, in CIDR notation. For example,192.168.0.0/16.
Verify that the networks were created:
gcloudcomputenetworkslist
Create the GKE environment
Create a new GKE cluster that uses multi-networking (Preview)and create a GPU node pool that has the following characteristics:- gVNIC enabled
- Multi-networking subnets specified for each secondary NIC
- A3 machine series with H100 GPUs backing the nodes
- Latest NVIDIA drivers installed
You can't update an existing cluster to use multi-networking.
GPUDirect-TCPXO
Choose an available GKE version that supports GPUDirect-TCPXO.To list the versions, run this command:
gcloudcontainerget-server-config\--format="yaml(validMasterVersions)"\--region=REGION\--project=PROJECT_IDReplace the following:
REGION: thecompute regionfor the cluster control plane.PROJECT_ID: your Google Cloud project ID.
Create a cluster:
gcloudbetacontainerclusterscreateCLUSTER_NAME\--enable-dataplane-v2\--enable-ip-alias\--location=CONTROL_PLANE_LOCATION\--enable-multi-networking\--cluster-version=VERSION\--no-enable-autoupgrade\--project=PROJECT_IDReplace the following:
CLUSTER_NAME: the name of your new cluster.VERSION: a GKE version thatsupports GPUDirect-TCPXO, as described inRequirements.CONTROL_PLANE_LOCATION: the Compute Enginelocation of the control plane of yourcluster. Provide a region for regional clusters, or a zone for zonal clusters.
Create Network and GKENetworkParamSet resources in the cluster thatcorrespond to the VPC networks and subnetworks that youcreated:
kubectlapply-f-<<EOFapiVersion:networking.gke.io/v1kind:Networkmetadata:name:vpc1spec:parametersRef:group:networking.gke.iokind:GKENetworkParamSetname:vpc1type:Device---apiVersion:networking.gke.io/v1kind:Networkmetadata:name:vpc2spec:parametersRef:group:networking.gke.iokind:GKENetworkParamSetname:vpc2type:Device---apiVersion:networking.gke.io/v1kind:Networkmetadata:name:vpc3spec:parametersRef:group:networking.gke.iokind:GKENetworkParamSetname:vpc3type:Device---apiVersion:networking.gke.io/v1kind:Networkmetadata:name:vpc4spec:parametersRef:group:networking.gke.iokind:GKENetworkParamSetname:vpc4type:Device---apiVersion:networking.gke.io/v1kind:Networkmetadata:name:vpc5spec:parametersRef:group:networking.gke.iokind:GKENetworkParamSetname:vpc5type:Device---apiVersion:networking.gke.io/v1kind:Networkmetadata:name:vpc6spec:parametersRef:group:networking.gke.iokind:GKENetworkParamSetname:vpc6type:Device---apiVersion:networking.gke.io/v1kind:Networkmetadata:name:vpc7spec:parametersRef:group:networking.gke.iokind:GKENetworkParamSetname:vpc7type:Device---apiVersion:networking.gke.io/v1kind:Networkmetadata:name:vpc8spec:parametersRef:group:networking.gke.iokind:GKENetworkParamSetname:vpc8type:Device---apiVersion:networking.gke.io/v1kind:GKENetworkParamSetmetadata:name:vpc1spec:vpc:PREFIX-net-1vpcSubnet:PREFIX-sub-1deviceMode:NetDevice---apiVersion:networking.gke.io/v1kind:GKENetworkParamSetmetadata:name:vpc2spec:vpc:PREFIX-net-2vpcSubnet:PREFIX-sub-2deviceMode:NetDevice---apiVersion:networking.gke.io/v1kind:GKENetworkParamSetmetadata:name:vpc3spec:vpc:PREFIX-net-3vpcSubnet:PREFIX-sub-3deviceMode:NetDevice---apiVersion:networking.gke.io/v1kind:GKENetworkParamSetmetadata:name:vpc4spec:vpc:PREFIX-net-4vpcSubnet:PREFIX-sub-4deviceMode:NetDevice---apiVersion:networking.gke.io/v1kind:GKENetworkParamSetmetadata:name:vpc5spec:vpc:PREFIX-net-5vpcSubnet:PREFIX-sub-5deviceMode:NetDevice---apiVersion:networking.gke.io/v1kind:GKENetworkParamSetmetadata:name:vpc6spec:vpc:PREFIX-net-6vpcSubnet:PREFIX-sub-6deviceMode:NetDevice---apiVersion:networking.gke.io/v1kind:GKENetworkParamSetmetadata:name:vpc7spec:vpc:PREFIX-net-7vpcSubnet:PREFIX-sub-7deviceMode:NetDevice---apiVersion:networking.gke.io/v1kind:GKENetworkParamSetmetadata:name:vpc8spec:vpc:PREFIX-net-8vpcSubnet:PREFIX-sub-8deviceMode:NetDeviceEOFThese resources tell GKE to configure the NICs for GPUtraffic in passthrough mode. GKE doesn't apply built-innetworking programming using eBPF to this traffic.
GPUDirect-TCPX
Create a cluster:
gcloudbetacontainerclusterscreateCLUSTER_NAME\--enable-dataplane-v2\--enable-ip-alias\--location=CONTROL_PLANE_LOCATION\--enable-multi-networking\--cluster-version=VERSION\--no-enable-autoupgrade\--project=PROJECT_IDReplace the following:
CLUSTER_NAME: the name of your new cluster.CONTROL_PLANE_LOCATION: the Compute Enginelocation of the control plane of yourcluster. Provide a region for regional clusters, or a zone for zonal clusters.VERSION: a GKE version thatsupports GPUDirect-TCPX, as described inRequirements.
Create Network and GKENetworkParamSet resources in the cluster thatcorrespond to the VPC networks and subnetworks that youcreated:
kubectlapply-f-<<EOFapiVersion:networking.gke.io/v1kind:Networkmetadata:name:vpc1spec:parametersRef:group:networking.gke.iokind:GKENetworkParamSetname:vpc1type:Device---apiVersion:networking.gke.io/v1kind:Networkmetadata:name:vpc2spec:parametersRef:group:networking.gke.iokind:GKENetworkParamSetname:vpc2type:Device---apiVersion:networking.gke.io/v1kind:Networkmetadata:name:vpc3spec:parametersRef:group:networking.gke.iokind:GKENetworkParamSetname:vpc3type:Device---apiVersion:networking.gke.io/v1kind:Networkmetadata:name:vpc4spec:parametersRef:group:networking.gke.iokind:GKENetworkParamSetname:vpc4type:Device---apiVersion:networking.gke.io/v1kind:GKENetworkParamSetmetadata:name:vpc1spec:vpc:PREFIX-net-1vpcSubnet:PREFIX-sub-1deviceMode:NetDevice---apiVersion:networking.gke.io/v1kind:GKENetworkParamSetmetadata:name:vpc2spec:vpc:PREFIX-net-2vpcSubnet:PREFIX-sub-2deviceMode:NetDevice---apiVersion:networking.gke.io/v1kind:GKENetworkParamSetmetadata:name:vpc3spec:vpc:PREFIX-net-3vpcSubnet:PREFIX-sub-3deviceMode:NetDevice---apiVersion:networking.gke.io/v1kind:GKENetworkParamSetmetadata:name:vpc4spec:vpc:PREFIX-net-4vpcSubnet:PREFIX-sub-4deviceMode:NetDeviceEOFThese resources tell GKE to configure the NICs for GPUtraffic in passthrough mode. GKE doesn't apply built-innetworking programming using eBPF to this traffic.
Create a GPU node pool
Best practice: After you create the cluster,create a separate node pool to run the GPUs.GPUDirect-TCPXO
Create a node pool for the H100 GPUs:
gcloudbetacontainernode-poolscreateNODE_POOL_NAME\--location=CONTROL_PLANE_LOCATION\--cluster=CLUSTER_NAME\--project=PROJECT_ID\--accelerator=type=nvidia-h100-mega-80gb,count=8,gpu-driver-version=LATEST\--machine-type=a3-megagpu-8g\--num-nodes=2\--additional-node-networknetwork=PREFIX-net-1,subnetwork=PREFIX-sub-1\--additional-node-networknetwork=PREFIX-net-2,subnetwork=PREFIX-sub-2\--additional-node-networknetwork=PREFIX-net-3,subnetwork=PREFIX-sub-3\--additional-node-networknetwork=PREFIX-net-4,subnetwork=PREFIX-sub-4\--additional-node-networknetwork=PREFIX-net-5,subnetwork=PREFIX-sub-5\--additional-node-networknetwork=PREFIX-net-6,subnetwork=PREFIX-sub-6\--additional-node-networknetwork=PREFIX-net-7,subnetwork=PREFIX-sub-7\--additional-node-networknetwork=PREFIX-net-8,subnetwork=PREFIX-sub-8\--enable-gvnic\--no-enable-autoupgrade\--scopes"https://www.googleapis.com/auth/cloud-platform"[\--placement-policy=POLICY_NAME\--reservation-affinity=specific\--reservation=projects/PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/BLOCK_NAME\--host-maintenance-interval=PERIODIC]ReplaceNODE_POOL_NAME with your node pool name.
In the example, the--scopes "https://www.googleapis.com/auth/cloud-platform"argument sets the node instance's scope to becloud-platform fortesting convenience. For production, you may want to limit the scope toconfigure finer-grained credentials.
To use a reservation, use the--placement-policy,--reservation-affinity,and--reservation flags. Specify these flags to configure the policy nameand reservation in the node pool. If the reservation doesn't require aresource policy, omit the--placement-policy flag.
The--reservation-affinity flag can take the values ofspecific orany. However, for high performance distributed AI workloads, we recommendthat you use a specific reservation. You can find information about your reservation, such as the name of your reservation or the name of a specificblock in your reservation. To find these values for on-demand reservations,view a list of yourreservations, or,view future reservationrequests.
Replace the following to use a reservation:
PROJECT_ID: optionally, your Google Cloudproject ID. If the reservation is located in the current project (notasharedreservation)you can omitprojects/PROJECT_ID/reservations/from thereservation value.RESERVATION_NAME: the name of your reservation.BLOCK_NAME: optionally, the name of aspecific block within the reservation. Omit/reservationBlocks/BLOCK_NAMEif you don'twant to use a specific block.
If this command fails, you might not have enough H100 GPU quota in yourproject. Ensure that you have quota and retry the command.
GPUDirect-TCPX
Create a node pool for the H100 GPUs:
gcloudcontainernode-poolscreateNODE_POOL_NAME\--cluster=CLUSTER_NAME\--location=CONTROL_PLANE_LOCATION\--machine-type=a3-highgpu-8g\--accelerator=type=nvidia-h100-80gb,count=8,gpu-driver-version=LATEST\--additional-node-network=network=PREFIX-net-1,subnetwork=PREFIX-sub-1\--additional-node-network=network=PREFIX-net-2,subnetwork=PREFIX-sub-2\--additional-node-network=network=PREFIX-net-3,subnetwork=PREFIX-sub-3\--additional-node-network=network=PREFIX-net-4,subnetwork=PREFIX-sub-4\--enable-gvnic\--no-enable-autoupgrade[\--placement-policy=POLICY_NAME\--reservation-affinity=specific\--reservation=projects/PROJECT_ID/reservations/RESERVATION_NAME/reservationBlocks/BLOCK_NAME]ReplaceNODE_POOL_NAME with the name of the node pool.
To use a reservation, use the--placement-policy,--reservation-affinity,and--reservation flags. Specify these flags to configure the policy nameand reservation in the node pool. If the reservation doesn't require aresource policy, omit the--placement-policy flag.
The--reservation-affinity flag can take the values ofspecific orany. However, for high performance distributed AI workloads, we recommendthat you use a specific reservation. You can find information about your reservation, such as the name of your reservation or the name of a specificblock in your reservation. To find these values for on-demand reservations,view a list of yourreservations, or,view future reservationrequests.
Replace the following to use a reservation:
PROJECT_ID: optionally, your Google Cloudproject ID. If the reservation is located in the current project (notasharedreservation)you can omitprojects/PROJECT_ID/reservations/from thereservation value.RESERVATION_NAME: the name of your reservation.BLOCK_NAME: optionally, the name of aspecific block within the reservation. Omit/reservationBlocks/BLOCK_NAMEif you don'twant to use a specific block.
If this command fails, you might not have enough H100 GPU quota in yourproject. Ensure that you have quota and retry the command.
After you create the node pool, verify that each node has the attached GPUs:
Get a list of nodes in the cluster:
kubectlgetnodesVerify that each GPU node has eight GPUs:
kubectldescribenodeNODE_NAMEReplace
NODE_NAMEwith the name of the node todescribe.The output is similar to the following:
Capacity: ... nvidia.com/gpu: 8Allocatable: ... nvidia.com/gpu: 8
Install the GPUDirect binary and configure NCCL
This section shows you how to install the GPUDirect binary, based on yourA3 machine type (GPUDirect-TCPX for A3 High, GPUDirect-TCPXO for A3 Mega) anda specific NCCL library version using a DaemonSet.
GPUDirect-TCPXO
This DaemonSet does the following:
- Pre-installation to setup GPUDirect-TCPXO related configurations.
- Installs the NCCL library and GPUDirect-TCPXO binary on the node.
- Stores the library and the binary in the
/home/kubernetes/bin/nvidia/lib64directory on the VM.By default, GKE mounts this directory into the/usr/local/nvidia/lib64path in GPU containers that need to useNCCL and GPUDirect-TCPXO.
To install the binary and configure NCCL, do the following steps:
Review the
nccl-tcpxo-installer.yamlDaemonset manifest in GitHub.Deploy the DaemonSet:
kubectlapply-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/gpudirect-tcpxo/nccl-tcpxo-installer.yamlThe NCCL plugin takes approximately two minutes to start running.
Verify the status of the DaemonSet Pods:
kubectlgetpods-n=kube-system-l=name=nccl-tcpxo-installerThe output is similar to the following:
# Outputnccl-tcpxo-installer-6c2pv 1/1 Running 0 2m11snccl-tcpxo-installer-qgg82 1/1 Running 0 2m11s
GPUDirect-TCPX
This DaemonSet does the following:
- Installs the NCCL library and GPUDirect-TCPX binary on the node.
- Stores the library and the binary in the
/home/kubernetes/bin/nvidia/lib64directory on the VM.By default, GKE mounts this directory into the/usr/local/nvidia/lib64path in GPU containers that need to useNCCL and GPUDirect-TCPX.
To install the binary and configure NCCL, do the following:
Review the
nccl-tcpx-installer.yamlDaemonset manifest in GitHub.Deploy the DaemonSet:
kubectlapply-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/gpudirect-tcpx/nccl-tcpx-installer.yamlThe NCCL plugin takes approximately two minutes to start running.
Verify the status of the DaemonSet Pods:
kubectlgetpods-n=kube-system-l=name=nccl-tcpx-installerThe output is similar to the following:
nccl-tcpx-installer-6c2pv 1/1 Running 0 2m11snccl-tcpx-installer-qgg82 1/1 Running 0 2m11s
Deploy NRI device injector plugin
This section shows you how to install the NRI device injector by using aDaemonSet. Both H100 GPU machine types install the same NRI device injectorplugin. This plugin does the following:
- Enables Node Resource Interface (NRI) on the node that has H100 GPUs.NRI is enabled by default on GKE version 1.29 and later.
- Deploys a NRI device injector plugin container that injects GPU devices intocontainers specified by Pod annotations.
To install the plugin, do the following:
Review the
nri-device-injector.yamlDeployment manifest in GitHub.Deploy the DaemonSet:
kubectlapply-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/nri_device_injector/nri-device-injector.yamlThe NCCL plugin takes approximately two minutes to start running.
Verify the status of the DaemonSet Pods:
kubectlgetpods-n=kube-system-l=name=device-injectorThe output is similar to the following:
# Outputdevice-injector-md6hb 1/1 Running 0 4h54mdevice-injector-vh9bm 1/1 Running 0 4h54m
Deploy a test workload
In this section, you deploy a sample workload to verify that NCCL andGPUDirect-TCPX or GPUDirect-TCPXO work as expected. This sample workloaddoes the following:
- Deploys two Pods, each of which runs in a node that has H100 GPUs.
- Deploys a sidecar container in each Pod to let those Pods use GPUDirect-TCPXOor GPUDirect-TCPX.
To deploy this sample workload, do the following:
GPUDirect-TCPXO
This workload includes a sidecar containernamed thetcpxo-daemon, which runs a service that lets the Pod useGPUDirect-TCPXO.You must add this sidecar container to any Pods in yourown environment that need to use GPUDirect-TCPXO. For a snippet of the requiredfields to add to your manifests, seeAdd GPUDirect to your manifest.
Review the
nccl-test-latest.yamlmanifest in GitHub.Deploy two Pods with the test workload:
kubectlapply-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/gpudirect-tcpxo/nccl-test-latest.yamlAfter the Pods deploy, trigger an all-gather test:
kubectlexec--stdin--tty--container=nccl-testnccl-test-host-1--/scripts/allgather.shnccl-host-1nccl-host-2The output is similar to the following:
Success: At this point, you've successfully installed GPUDirect-TCPXO on yournodes and can use it to optimize the throughput of GPU-heavy workloads that runon those nodes. The required fields to use GPUDirect-TCPXO in your own Pods aredescribed inAdd GPUDirect to your manifests in this document.# out-of-place in-place# size count type redop root time algbw busbw #wrong time algbw busbw #wrong# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 0 0 float none -1 0.24 0.00 0.00 0 0.18 0.00 0.00 0 0 0 float none -1 0.19 0.00 0.00 0 0.17 0.00 0.00 0 0 0 float none -1 0.17 0.00 0.00 0 0.17 0.00 0.00 0 0 0 float none -1 0.17 0.00 0.00 0 0.17 0.00 0.00 0 0 0 float none -1 0.17 0.00 0.00 0 0.17 0.00 0.00 0 256 4 float none -1 235.2 0.00 0.00 0 235.1 0.00 0.00 0 512 8 float none -1 241.0 0.00 0.00 0 236.1 0.00 0.00 0 1024 16 float none -1 236.3 0.00 0.00 0 233.3 0.00 0.00 0 2048 32 float none -1 234.1 0.01 0.01 0 233.4 0.01 0.01 0 4096 64 float none -1 237.1 0.02 0.02 0 235.3 0.02 0.02 0 8192 128 float none -1 236.2 0.03 0.03 0 235.2 0.03 0.03 0 16384 256 float none -1 236.6 0.07 0.06 0 238.5 0.07 0.06 0 32768 512 float none -1 237.9 0.14 0.13 0 238.8 0.14 0.13 0 65536 1024 float none -1 242.3 0.27 0.25 0 239.4 0.27 0.26 0 131072 2048 float none -1 263.0 0.50 0.47 0 275.1 0.48 0.45 0 262144 4096 float none -1 279.2 0.94 0.88 0 269.9 0.97 0.91 0 524288 8192 float none -1 273.5 1.92 1.80 0 273.5 1.92 1.80 0 1048576 16384 float none -1 315.1 3.33 3.12 0 314.1 3.34 3.13 0 2097152 32768 float none -1 319.2 6.57 6.16 0 311.5 6.73 6.31 0 4194304 65536 float none -1 331.8 12.64 11.85 0 331.3 12.66 11.87 0 8388608 131072 float none -1 356.3 23.54 22.07 0 353.8 23.71 22.23 0 16777216 262144 float none -1 409.1 41.01 38.45 0 405.2 41.40 38.81 0 33554432 524288 float none -1 451.4 74.34 69.69 0 447.7 74.94 70.26 0 67108864 1048576 float none -1 713.4 94.07 88.19 0 713.8 94.01 88.13 0 134217728 2097152 float none -1 1122.1 119.62 112.14 0 1116.3 120.23 112.72 0 268435456 4194304 float none -1 1785.8 150.32 140.92 0 1769.2 151.72 142.24 0 536870912 8388608 float none -1 2859.7 187.74 176.00 0 2852.6 188.20 176.44 0 1073741824 16777216 float none -1 5494.1 195.44 183.22 0 5568.2 192.83 180.78 0 2147483648 33554432 float none -1 10841 198.09 185.71 0 10798 198.88 186.45 0 4294967296 67108864 float none -1 21453 200.21 187.70 0 21490 199.86 187.37 0 8589934592 134217728 float none -1 42603 201.63 189.03 0 42670 201.31 188.73 0# Out of bounds values : 0 OK# Avg bus bandwidth : 45.7587#
GPUDirect-TCPX
This workload includes a sidecar containernamed thetcpx-daemon, which runs a service that lets the Pod useGPUDirect-TCPX.You must add this sidecar container to any Pods in yourown environment that need to use GPUDirect-TCPX. For a snippet of the requiredfields to add to your manifests, seeAdd GPUDirect to your manifest.
Review the
nccl-config.yamlConfigMap manifest in GitHub.This manifest deploys scripts that initialize an NCCL all-gather test andsets NCCL-specific configuration settings.Review the
nccl-test-latest.yamlDeployment manifest in GitHub.Deploy the ConfigMap and the test workload:
kubectlapply-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/gpudirect-tcpx/nccl-config.yamlkubectlapply-fhttps://raw.githubusercontent.com/GoogleCloudPlatform/container-engine-accelerators/master/gpudirect-tcpx/nccl-test-latest.yamlRun the following commands to trigger an NCCL all-gather test for thenodes:
kubectlexec\--stdin--tty--container=nccl-testnccl-test-host-1\--/configs/allgather.shnccl-host-1nccl-host-2The output is similar to the following:
# out-of-place in-place# size count type redop root time algbw busbw #wrong time algbw busbw #wrong# (B) (elements) (us) (GB/s) (GB/s) (us) (GB/s) (GB/s) 1048576 16384 float none -1 696.8 1.50 1.41 0 729.0 1.44 1.35 0 2097152 32768 float none -1 776.4 2.70 2.53 0 726.7 2.89 2.71 0 4194304 65536 float none -1 774.3 5.42 5.08 0 805.1 5.21 4.88 0 8388608 131072 float none -1 812.1 10.33 9.68 0 817.6 10.26 9.62 0 16777216 262144 float none -1 1035.2 16.21 15.19 0 1067.8 15.71 14.73 0 33554432 524288 float none -1 1183.3 28.36 26.59 0 1211.8 27.69 25.96 0 67108864 1048576 float none -1 1593.4 42.12 39.49 0 1510.5 44.43 41.65 0 134217728 2097152 float none -1 2127.8 63.08 59.13 0 2312.7 58.03 54.41 0 268435456 4194304 float none -1 3603.0 74.50 69.85 0 3586.2 74.85 70.17 0 536870912 8388608 float none -1 7101.7 75.60 70.87 0 7060.9 76.03 71.28 0# Out of bounds values : 0 OK# Avg bus bandwidth : 29.8293
Use required NCCL configuration settings to improve performance
The following key-value pairs are the required NCCL configuration settings for GPUDirect-TCPX and GPUDirect-TCPXO. When deploying your workloads that use NCCL, set them as environment variables to optimize performance.
GPUDirect-TCPXO
"LD_LIBRARY_PATH=\"${LD_LIBRARY_PATH}:/usr/local/nvidia/lib64\"","NCCL_FASTRAK_CTRL_DEV=eth0","NCCL_FASTRAK_IFNAME=eth1,eth2,eth3,eth4,eth5,eth6,eth7,eth8","NCCL_SOCKET_IFNAME=eth0","NCCL_CROSS_NIC=0","NCCL_ALGO=Ring,Tree","NCCL_PROTO=Simple,LL128","NCCL_MIN_NCHANNELS=4","NCCL_TUNER_PLUGIN=libnccl-tuner.so","NCCL_TUNER_CONFIG_PATH=/usr/local/nvidia/lib64/a3plus_tuner_config.textproto","NCCL_SHIMNET_GUEST_CONFIG_CHECKER_CONFIG_FILE=/usr/local/nvidia/lib64/a3plus_guest_config.textproto","NCCL_DYNAMIC_CHUNK_SIZE=524288","NCCL_P2P_NET_CHUNKSIZE=524288","NCCL_P2P_PCI_CHUNKSIZE=524288","NCCL_P2P_NVL_CHUNKSIZE=1048576","NCCL_FASTRAK_NUM_FLOWS=2","NCCL_FASTRAK_USE_SNAP=1","NCCL_FASTRAK_PLUGIN_ACCEPT_TIMEOUT_MS=600000","NCCL_FASTRAK_ENABLE_CONTROL_CHANNEL=0","NCCL_BUFFSIZE=8388608","CUDA_VISIBLE_DEVICES=0,1,2,3,4,5,6,7","NCCL_NET_GDR_LEVEL=PIX","NCCL_FASTRAK_ENABLE_HOTPATH_LOGGING=0","NCCL_FASTRAK_USE_LLCM=1","NCCL_NVLS_ENABLE=0"Optionally, you can set all the configurations at once by following these steps:
In your workload container manifest, add the following key-value pair asan environment variable:
NCCL_LIB_DIR="/usr/local/nvidia/lib64"Ensure the
nccl-env-profile.shscript is executed when your workloadcontainer starts. For example, you can do this in your Pod specificationby overriding the container's command to include the following:source${NCCL_LIB_DIR}/nccl-env-profile.sh
us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/nccl-plugin-gpudirecttcpx-dev:v1.0.9-1NCCL plugin version, the LL128 NCCL communication protocol support becomes thedefault tuning parameter in GPUDirect-TCPXO. To use or disable LL128, see theLL128 support section.LL128 support
The NVIDIA LL128 (low-latency 128) NCCL communication protocol can significantlyimprove performance for small-to-medium sized collectives. GPUDirect-TCPXOsupports the LL128 protocol.
To use LL128, ensure that thenccl-tcpxo-installer.yaml file in theInstall the GPUDirect binary and configure NCCL sectionuses the following container image version or later:
us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/nccl-plugin-gpudirecttcpx-dev:v1.0.8-1To set up LL128, do the following:
For the
us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/nccl-plugin-gpudirecttcpx-dev:v1.0.8-1NCCL plugin version, do these steps:In your workload manifest, set the following environment variable:
NCCL_LIB_DIR="/usr/local/nvidia/lib64Configure your workload to execute the
nccl-env-profile-ll128.shscriptwhen the container starts. In your workload manifest, set the followingcommand:source ${NCCL_LIB_DIR}/nccl-env-profile-ll128.shThe
nccl-env-profile-ll128.shscript has the following environment variables:NCCL_PROTO=Simple,LL128NCCL_TUNER_CONFIG_PATH=/usr/local/nvidia/lib64/a3plus_tuner_config_ll128.textprotoNCCL_SHIMNET_GUEST_CONFIG_CHECKER_CONFIG_FILE=/usr/local/nvidia/lib64/a3plus_guest_config_ll128.textproto
For the
us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/nccl-plugin-gpudirecttcpx-dev:v1.0.9-1NCCL plugin version and later, LL128 becomes a default parameter, so usingeithernccl-env-profile.shscript ornccl-env-profile-ll128.shscriptenables LL128. To disable LL128:In your workload manifest, set the following environment variable:
NCCL_LIB_DIR="/usr/local/nvidia/lib64Configure your workload to execute the
nccl-env-profile-ll128.shscriptwhen the container starts. In your workload manifest, set the followingcommand:source ${NCCL_LIB_DIR}/nccl-env-profile-simple.shThe
nccl-env-profile-simple.shscript has the following environment variables:NCCL_PROTO=SimpleNCCL_TUNER_CONFIG_PATH=/usr/local/nvidia/lib64/a3plus_tuner_config_simple.textprotoNCCL_SHIMNET_GUEST_CONFIG_CHECKER_CONFIG_FILE=/usr/local/nvidia/lib64/a3plus_tuner_config_simple.textproto
GPUDirect-TCPX
"LD_LIBRARY_PATH=\"${LD_LIBRARY_PATH}:/usr/local/tcpx/lib64\"","NCCL_SOCKET_IFNAME=\"eth0\"","NCCL_ALGO=Ring","NCCL_PROTO=Simple","NCCL_CROSS_NIC=0","NCCL_NET_GDR_LEVEL=PIX","NCCL_P2P_PXN_LEVEL=0","NCCL_GPUDIRECTTCPX_SOCKET_IFNAME=eth1,eth2,eth3,eth4","NCCL_GPUDIRECTTCPX_CTRL_DEV=eth0","NCCL_DYNAMIC_CHUNK_SIZE=524288","NCCL_P2P_NET_CHUNKSIZE=524288","NCCL_P2P_PCI_CHUNKSIZE=524288","NCCL_P2P_NVL_CHUNKSIZE=1048576","NCCL_BUFFSIZE=4194304","NCCL_NSOCKS_PERTHREAD=4","NCCL_SOCKET_NTHREADS=1","NCCL_GPUDIRECTTCPX_TX_BINDINGS=\"eth1:8-21,112-125;eth2:8-21,112-125;eth3:60-73,164-177;eth4:60-73,164-177\"","NCCL_GPUDIRECTTCPX_RX_BINDINGS=\"eth1:22-35,126-139;eth2:22-35,126-139;eth3:74-87,178-191;eth4:74-87,178-191\"","NCCL_GPUDIRECTTCPX_PROGRAM_FLOW_STEERING_WAIT_MICROS=500000"eth0 is used for control traffic of theGPUDirect-TCPX workload. Avoid rate limiting or restricting the primaryeth0 device. You can remove theNCCL_GPUDIRECTTCPX_CTRL_DEV setting,which specifies the network interface forGPUDirect-TCPX control traffic,and the control traffic will instead use its GPU aligned network device.However, NCCL itself will continue to useeth0 for orchestration becauseit's set as the value for theNCCL_SOCKET_IFNAME.Collect NCCL debugging logs
To log NCCL errors, we recommend that you add the following NCCL config:
NCCL_DEBUG=INFONCCL_DEBUG_SUBSYS=INIT,NET,ENV,COLL,GRAPHNCCL_DEBUG_FILE=/DIRECTORY/FILE_NAME.%h.%pNCCL_DEBUG=INFO: prints debugging information.- For large-scale workloads (64 nodes or more), extensive logging canoccur. To avoid this scenario—and unless you specified
NCCL_DEBUG_FILE—we recommend settingNCCL_DEBUG=WARNto limit logs toerrors only.
- For large-scale workloads (64 nodes or more), extensive logging canoccur. To avoid this scenario—and unless you specified
NCCL_DEBUG_SUBSYS: filters the subsystems for which NCCL collectsdebugging information. We recommend that you collect logs for thefollowing subsystems:INIT: the initialization phase of NCCL.NET: the NCCL network.ENV: the environment variables that NCCL uses.COLL: collective operations.GRAPH: topology detection and graph search.
If you want to collect logs for different subsystems, see
NCCL_DEBUG_SUBSYSin the NCCL documentation for a list of accepted values.NCCL_DEBUG_FILE(Optional): directs the NCCL debug logging output to a file thatyou specify. This variable writes NCCL logs tostandard files, which prevents the log output from mixing with applicationoutput. This variable also writes logs from different NCCL ranks todifferent files, which prevents the logs from mixing.Use the following filename format:
/DIRECTORY/FILE_NAME.%h.%pReplace the following:
DIRECTORY: the directory where you want tostore the log files.FILE_NAME: the name of the log files.
The placeholder
%hresolves to the hostname of the node, while%presolves to the process ID (PID) of the process that's generating thelog.
For more information about debugging NCCL logs, seeTroubleshoot GPUs in GKE.
Add GPUDirect to your manifests
This section shows the required fields that you must add to your Kubernetesmanifests for your Pods to use GPUDirect.
Depending on the type of GPUDirect, do the following:
GPUDirect-TCPXO
Add the following annotations to the Pod metadata.Without these annotations,
hostNetwork:truewill be required for the Pod, andprivileged:truewill be required for thetcpxo-daemoncontainer.metadata:annotations:devices.gke.io/container.tcpxo-daemon:|+- path: /dev/nvidia0- path: /dev/nvidia1- path: /dev/nvidia2- path: /dev/nvidia3- path: /dev/nvidia4- path: /dev/nvidia5- path: /dev/nvidia6- path: /dev/nvidia7- path: /dev/nvidiactl- path: /dev/nvidia-uvm- path: /dev/dmabuf_import_helpernetworking.gke.io/default-interface:'eth0'networking.gke.io/interfaces:|[{"interfaceName":"eth0","network":"default"},{"interfaceName":"eth1","network":"vpc1"},{"interfaceName":"eth2","network":"vpc2"},{"interfaceName":"eth3","network":"vpc3"},{"interfaceName":"eth4","network":"vpc4"},{"interfaceName":"eth5","network":"vpc5"},{"interfaceName":"eth6","network":"vpc6"},{"interfaceName":"eth7","network":"vpc7"},{"interfaceName":"eth8","network":"vpc8"}]Add the following fields to the Pod specification:
spec:volumes:-name:librarieshostPath:path:/home/kubernetes/bin/nvidia/lib64-name:syshostPath:path:/sys-name:proc-syshostPath:path:/proc/sys-name:aperture-deviceshostPath:path:/dev/aperture_devicesAdd the following container to the manifest to run the
tcpxo-daemonservice.Replace (TCPXO_DAEMON_IMAGE) with the latest image,us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/tcpgpudmarxd-dev:v1.0.17:-name:tcpxo-daemonimage:TCPXO_DAEMON_IMAGEimagePullPolicy:Alwayscommand:["/bin/sh","-c"]args:-|set -exchmod 755 /fts/entrypoint_rxdm_container.sh/fts/entrypoint_rxdm_container.sh --num_hops=2 --num_nics=8 --uid= --alsologtostderrsecurityContext:capabilities:add:-NET_ADMIN-NET_BIND_SERVICEvolumeMounts:-name:librariesmountPath:/usr/local/nvidia/lib64-name:sysmountPath:/hostsysfs-name:proc-sysmountPath:/hostprocsysfsenv:-name:LD_LIBRARY_PATHvalue:/usr/local/nvidia/lib64Add the following environment variable to every GPU container:
env:-name:LD_LIBRARY_PATHvalue:/usr/local/nvidia/lib64-name:NCCL_FASTRAK_LLCM_DEVICE_DIRECTORYvalue:/dev/aperture_devicesAdd the following volumeMounts to every GPU container. Without
aperture_devicessetups,privileged:trueis required for GPU containers:volumeMounts:-name:aperture-devicesmountPath:/dev/aperture_devicesAdd environment variables to configure NCCL options. For details, seeUse recommended NCCL configuration settings to improve performance.
A completed Pod specification looks like the following:
apiVersion:v1kind:Podmetadata:name:a3plus-workloadsannotations:devices.gke.io/container.tcpxo-daemon:|+- path: /dev/nvidia0- path: /dev/nvidia1- path: /dev/nvidia2- path: /dev/nvidia3- path: /dev/nvidia4- path: /dev/nvidia5- path: /dev/nvidia6- path: /dev/nvidia7- path: /dev/nvidiactl- path: /dev/nvidia-uvm- path: /dev/dmabuf_import_helpernetworking.gke.io/default-interface:'eth0'networking.gke.io/interfaces:|[{"interfaceName":"eth0","network":"default"},{"interfaceName":"eth1","network":"vpc1"},{"interfaceName":"eth2","network":"vpc2"},{"interfaceName":"eth3","network":"vpc3"},{"interfaceName":"eth4","network":"vpc4"},{"interfaceName":"eth5","network":"vpc5"},{"interfaceName":"eth6","network":"vpc6"},{"interfaceName":"eth7","network":"vpc7"},{"interfaceName":"eth8","network":"vpc8"}]...containers:-name:tcpxo-daemonimage:TCPXO_DAEMON_IMAGEimagePullPolicy:Alwayscommand:["/bin/sh","-c"]args:-|set -exchmod 755 /fts/entrypoint_rxdm_container.sh/fts/entrypoint_rxdm_container.sh --num_hops=2 --num_nics=8 --uid= --alsologtostderrsecurityContext:capabilities:add:-NET_ADMIN-NET_BIND_SERVICEvolumeMounts:-name:librariesmountPath:/usr/local/nvidia/lib64-name:sysmountPath:/hostsysfs-name:proc-sysmountPath:/hostprocsysfsenv:-name:LD_LIBRARY_PATHvalue:/usr/local/nvidia/lib64-name:main-application-container...env:-name:LD_LIBRARY_PATHvalue:/usr/local/nvidia/lib64-name:NCCL_FASTRAK_LLCM_DEVICE_DIRECTORYvalue:/dev/aperture_devicessecurityContext:volumeMounts:-name:aperture-devicesmountPath:/dev/aperture_devicesresources:limits:nvidia.com/gpu:8volumes:-name:librarieshostPath:path:/home/kubernetes/bin/nvidia-name:syshostPath:path:/sys-name:proc-syshostPath:path:/proc/sys-name:aperture-deviceshostPath:path:/dev/aperture_devicesGPUDirect-TCPX
Add the following annotations to the Pod metadata.Without these annotations,
hostNetwork:truewill be required for the Pod, andprivileged:truewill be required for thetcpx-daemoncontainer.metadata:annotations:devices.gke.io/container.tcpx-daemon:|+- path: /dev/nvidia0- path: /dev/nvidia1- path: /dev/nvidia2- path: /dev/nvidia3- path: /dev/nvidia4- path: /dev/nvidia5- path: /dev/nvidia6- path: /dev/nvidia7- path: /dev/nvidiactl- path: /dev/nvidia-uvmnetworking.gke.io/default-interface:'eth0'networking.gke.io/interfaces:|[{"interfaceName":"eth0","network":"default"},{"interfaceName":"eth1","network":"vpc1"},{"interfaceName":"eth2","network":"vpc2"},{"interfaceName":"eth3","network":"vpc3"},{"interfaceName":"eth4","network":"vpc4"},]Add the following fields to the Pod specification:
spec:volumes:-name:librarieshostPath:path:/home/kubernetes/bin/nvidia/lib64-name:syshostPath:path:/sys-name:proc-syshostPath:path:/proc/sysAdd the following container to the manifest to run thetcpx-daemon service:
-name:tcpx-daemonimage:us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpx/tcpgpudmarxd-dev:v2.0.9command:-/tcpgpudmarxd/build/app/tcpgpudmarxd---gpu_nic_preset-a3vm---gpu_shmem_type-fd---uds_path-/run/tcpx---setup_param-\"--verbose 128 2 0 \"securityContext:capabilities:add:-NET_ADMINvolumeMounts:-name:librariesmountPath:/usr/local/nvidia/lib64-name:tcpx-socketmountPath:/run/tcpx-name:sysmountPath:/hostsysfs-name:proc-sysmountPath:/hostprocsysfsenv:-name:LD_LIBRARY_PATHvalue:/usr/local/nvidia/lib64Add the following volume mounts to any containers that request GPUs:
Note: The default tcpx-socket path isvolumeMounts:-name:tcpx-socketmountPath:/tmp-name:librariesmountPath:/usr/local/nvidia/lib64/tmpfor containers that request GPUs.If you set theNCCL_GPUDIRECTTCPX_UNIX_CLIENT_PREFIXenvironment variable to avalue other than/tmp, GKE mounts thetcpx-socketvolume to thatmountPath.Add environment variables to configure NCCL options. For details, see theUse recommended NCCL configuration settings to improve performancesection in this document.
Add the following environment variable to every GPU container:
env:-name:LD_LIBRARY_PATHvalue:/usr/local/nvidia/lib64
A completed Pod specification looks like the following:
apiVersion:v1kind:Podmetadata:name:a3-gpu-workloads-examplelabels:name:a3-gpu-workloads-exampleannotations:devices.gke.io/container.tcpx-daemon:|+- path: /dev/nvidia0- path: /dev/nvidia1- path: /dev/nvidia2- path: /dev/nvidia3- path: /dev/nvidia4- path: /dev/nvidia5- path: /dev/nvidia6- path: /dev/nvidia7- path: /dev/nvidiactl- path: /dev/nvidia-uvmnetworking.gke.io/default-interface:'eth0'networking.gke.io/interfaces:|[{"interfaceName":"eth0","network":"default"},{"interfaceName":"eth1","network":"vpc1"},{"interfaceName":"eth2","network":"vpc2"},{"interfaceName":"eth3","network":"vpc3"},{"interfaceName":"eth4","network":"vpc4"}]spec:containers:-name:tcpx-daemonimage:us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpx/tcpgpudmarxd-dev:v2.0.11imagePullPolicy:Alwayscommand:-/tcpgpudmarxd/build/app/tcpgpudmarxd---gpu_nic_preset-a3vm---gpu_shmem_type-fd---uds_path-/run/tcpx---setup_param-\"--verbose 128 2 0 \"securityContext:capabilities:add:-NET_ADMINvolumeMounts:-name:librariesmountPath:/usr/local/nvidia/lib64readOnly:true-name:tcpx-socketmountPath:/run/tcpx-name:sysmountPath:/hostsysfs-name:proc-sysmountPath:/hostprocsysfsenv:-name:LD_LIBRARY_PATHvalue:/usr/local/nvidia/lib64-name:a3-gpu-workloads-example...volumeMounts:-name:tcpx-socketmountPath:/tmp-name:librariesmountPath:/usr/local/nvidia/lib64readOnly:trueresources:limits:nvidia.com/gpu:8env:-name:LD_LIBRARY_PATHvalue:/usr/local/nvidia/lib64...volumes:-name:librarieshostPath:path:/home/kubernetes/bin/nvidia/lib64-name:tcpx-socketemptyDir:-name:syshostPath:path:/sys-name:proc-syshostPath:path:/proc/sysWhat's next
- Read theGPUDirect-TCPXO Release Notes
- Learn more about thebest practice to run workloads with GPUDirect-TCPX(O)
- Learn aboutbest practices for GKE networking.
- Learn more about theNvidia GPUDirect family of technologies for data movement and access on Nvidia GPUs.
- Learn aboutcurrent GPU version availability andrequesting GPUs in GKE.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-18 UTC.