Create an A3 Mega, A3 High, or A3 Edge instance with GPUDirect enabled Stay organized with collections Save and categorize content based on your preferences.
This document describes the setup for A3 Mega, A3 High, or A3 Edge virtualmachine (VM) instances that have eight NVIDIA H100 GPUs attached and use eitherof the following GPUDirect technologies: GPUDirect-TCPX or GPUDirect-TCPXO. Tocreate an A3 High instance with less than 8 GPUs, seeCreate an A3 High or A2 instance.
TheGPUDirect technologythat you use depends on the A3 machine type that you select.
- GPUDirect-TCPXO: is an RDMA-like, offloaded networking stack thatis supported on A3 Mega (
a3-megagpu-8g) machine types that have eightH100 GPUs. - GPUDirect-TCPX: is an optimized version of guest TCP that provideslower latency and is supported on A3 High (
a3-highgpu-8g) and A3 Edge(a3-edgegpu-8g) machine types that have eight H100 GPUs.
TheA3 accelerator-optimized machine serieshas 208 vCPUs, and up to 1872GB of memory. Thea3-megagpu-8g,a3-highgpu-8g, anda3-edgegpu-8g machine types offer 80 GB GPU memory per GPU. These machinetypes can get up to 1,800 Gbps of network bandwidth, which makes them idealfor large transformer-based language models, databases, and high performancecomputing (HPC).
Both GPUDirect-TCPX and GPUDirect-TCPXO use NVIDIA GPUDirect technology toincrease performance and reduce latency for your A3 VMs. They achievethis by allowing data packet payloads to transfer directly from GPU memory tothe network interface, bypassing the CPU and system memory. This is a form ofremote direct memory access (RDMA).When combined with Google Virtual NIC (gVNIC), A3 VMs can deliver the highestthroughput between VMs in a cluster when compared to the previous generationA2 or G2 accelerator-optimized machine types.
This document describes how to create an A3 Mega, A3 High, or A3 Edge VM andenable either GPUDirect-TCPX or GPUDirect-TCPXO to test the improved GPU networkperformance.
Before you begin
- To review limitations and additional prerequisite steps for creatinginstances with attached GPUs, such as selecting an OS image and checkingGPU quota, seeOverview of creating an instance with attached GPUs.
- If you haven't already, set upauthentication. Authentication verifies your identity for access to Google Cloud services and APIs. To run code or samples from a local development environment, you can authenticate to Compute Engine by selecting one of the following options:
Install the Google Cloud CLI. After installation,initialize the Google Cloud CLI by running the following command:
gcloudinit
If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.
Note: If you installed the gcloud CLI previously, make sure you have the latest version by runninggcloud components update.- Set a default region and zone.
Required roles
To get the permissions that you need to create VMs, ask your administrator to grant you theCompute Instance Admin (v1) (roles/compute.instanceAdmin.v1) IAM role on the project. For more information about granting roles, seeManage access to projects, folders, and organizations.
This predefined role contains the permissions required to create VMs. To see the exact permissions that are required, expand theRequired permissions section:
Required permissions
The following permissions are required to create VMs:
compute.instances.createon the project- To use a custom image to create the VM:
compute.images.useReadOnlyon the image - To use a snapshot to create the VM:
compute.snapshots.useReadOnlyon the snapshot - To use an instance template to create the VM:
compute.instanceTemplates.useReadOnlyon the instance template - To specify a subnet for your VM:
compute.subnetworks.useon the project or on the chosen subnet - To specify a static IP address for the VM:
compute.addresses.useon the project - To assign an external IP address to the VM when using a VPC network:
compute.subnetworks.useExternalIpon the project or on the chosen subnet - To assign alegacy network to the VM:
compute.networks.useon the project - To assign an external IP address to the VM when using a legacy network:
compute.networks.useExternalIpon the project - To set VM instance metadata for the VM:
compute.instances.setMetadataon the project - To set tags for the VM:
compute.instances.setTagson the VM - To set labels for the VM:
compute.instances.setLabelson the VM - To set a service account for the VM to use:
compute.instances.setServiceAccounton the VM - To create a new disk for the VM:
compute.disks.createon the project - To attach an existing disk in read-only or read-write mode:
compute.disks.useon the disk - To attach an existing disk in read-only mode:
compute.disks.useReadOnlyon the disk
You might also be able to get these permissions withcustom roles or otherpredefined roles.
Overview
To test network performance with GPUDirect, complete the following steps:
- Set up one or more Virtual Private Cloud (VPC) networks that have a largeMTU configured.
- Create your GPU instance.
Set up VPC networks
To enable efficient communication for your GPU VMs, you need to create a management network andone or more data networks. The management network is used for external access, for example SSH,and for most general network communication. The data networks are used for high-performancecommunication between the GPUs on different VMs, for example, for Remote Direct Memory Access (RDMA)traffic.
For these VPC networks, we recommend setting themaximum transmission unit (MTU) to a larger value.Higher MTU values increase the packet size and reduce the packet-header overhead,which increases payload data throughput. For more information about how to create VPCnetworks, seeCreate and verify a jumbo frame MTU network.
Create management network, subnet, and firewall rule
Complete the following steps to set up the management network:
Create the management network by using the
networks createcommand:gcloud compute networks createNETWORK_NAME_PREFIX-mgmt-net \ --project=PROJECT_ID \ --subnet-mode=custom \ --mtu=8244
Create the management subnet by using the
networks subnets createcommand:gcloud compute networks subnets createNETWORK_NAME_PREFIX-mgmt-sub \ --project=PROJECT_ID \ --network=NETWORK_NAME_PREFIX-mgmt-net \ --region=REGION \ --range=192.168.0.0/24
Create firewall rules by using the
firewall-rules createcommand.Create a firewall rule for the management network.
gcloud compute firewall-rules createNETWORK_NAME_PREFIX-mgmt-internal \ --project=PROJECT_ID \ --network=NETWORK_NAME_PREFIX-mgmt-net \ --action=ALLOW \ --rules=tcp:0-65535,udp:0-65535,icmp \ --source-ranges=192.168.0.0/16
Create the
tcp:22firewall rule to limit which source IPaddresses can connect to your VM by using SSH.gcloud compute firewall-rules createNETWORK_NAME_PREFIX-mgmt-external-ssh \ --project=PROJECT_ID \ --network=NETWORK_NAME_PREFIX-mgmt-net \ --action=ALLOW \ --rules=tcp:22 \ --source-ranges=SSH_SOURCE_IP_RANGE
Create the
icmpfirewall rule that can be used to check fordata transmission issues in the network.gcloud compute firewall-rules createNETWORK_NAME_PREFIX-mgmt-external-ping \ --project=PROJECT_ID \ --network=NETWORK_NAME_PREFIX-mgmt-net \ --action=ALLOW \ --rules=icmp \ --source-ranges=0.0.0.0/0
Replace the following:
NETWORK_NAME_PREFIX: the name prefix to use for the VPC networks and subnets.PROJECT_ID: your project ID.REGION: the region where you want to create the networks.SSH_SOURCE_IP_RANGE: IP range in CIDR format. This specifies which source IP addresses can connect to your VM by using SSH.
Create data networks, subnets, and firewall rule
The number of data networks varies depending on the type of GPU machine you are creating.A3 Mega
A3 Mega requires eight data networks. To create eight data networks, each with subnets and firewall rules, use the following command.
for N in $(seq 1 8); dogcloud compute networks createNETWORK_NAME_PREFIX-data-net-$N \ --project=PROJECT_ID \ --subnet-mode=custom \ --mtu=8244gcloud compute networks subnets createNETWORK_NAME_PREFIX-data-sub-$N \ --project=PROJECT_ID \ --network=NETWORK_NAME_PREFIX-data-net-$N \ --region=REGION \ --range=192.168.$N.0/24gcloud compute firewall-rules createNETWORK_NAME_PREFIX-data-internal-$N \ --project=PROJECT_ID \ --network=NETWORK_NAME_PREFIX-data-net-$N \ --action=ALLOW \ --rules=tcp:0-65535,udp:0-65535,icmp \ --source-ranges=192.168.0.0/16done
A3 High and A3 Edge
A3 High and A3 Edge require four data networks. Use the following command to create fourdata networks, each with subnets and firewall rules.
for N in $(seq 1 4); dogcloud compute networks createNETWORK_NAME_PREFIX-data-net-$N \ --project=PROJECT_ID \ --subnet-mode=custom \ --mtu=8244gcloud compute networks subnets createNETWORK_NAME_PREFIX-data-sub-$N \ --project=PROJECT_ID \ --network=NETWORK_NAME_PREFIX-data-net-$N \ --region=REGION \ --range=192.168.$N.0/24gcloud compute firewall-rules createNETWORK_NAME_PREFIX-data-internal-$N \ --project=PROJECT_ID \ --network=NETWORK_NAME_PREFIX-data-net-$N \ --action=ALLOW \ --rules=tcp:0-65535,udp:0-65535,icmp \ --source-ranges=192.168.0.0/16done
Create A3 Mega instances (GPUDirect-TCPXO)
Create your A3 Mega instances by using thecos-121-ltsor later Container-Optimized OS image or therocky-linux-8-optimized-gcp-nvidia-580Rocky Linux image.
COS
To test network performance with GPUDirect-TCPXO, create at least two A3 Mega VM instances. Create each VM by using thecos-121-lts or later Container-Optimized OS image and specifying the VPC networks that you created in the previous step.
A3 Mega VMs require nine Google Virtual NIC (gVNIC) network interfaces, one for the management network and eight for the data networks.
Based on theprovisioning model that you want to use to create your VM, select one of the following options:
Standard
gcloud compute instances createVM_NAME \ --project=PROJECT_ID \ --zone=ZONE \ --machine-type=a3-megagpu-8g \ --maintenance-policy=TERMINATE \ --restart-on-failure \ --image-family=cos-121-lts \ --image-project=cos-cloud \ --boot-disk-size=BOOT_DISK_SIZE \ --metadata=cos-update-strategy=update_disabled \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-mgmt-net,subnet=NETWORK_NAME_PREFIX-mgmt-sub \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-1,subnet=NETWORK_NAME_PREFIX-data-sub-1,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-2,subnet=NETWORK_NAME_PREFIX-data-sub-2,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-3,subnet=NETWORK_NAME_PREFIX-data-sub-3,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-4,subnet=NETWORK_NAME_PREFIX-data-sub-4,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-5,subnet=NETWORK_NAME_PREFIX-data-sub-5,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-6,subnet=NETWORK_NAME_PREFIX-data-sub-6,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-7,subnet=NETWORK_NAME_PREFIX-data-sub-7,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-8,subnet=NETWORK_NAME_PREFIX-data-sub-8,no-address
Replace the following:
VM_NAME: the name of your VM instance.PROJECT_ID: the ID of your project.ZONE: a zone thatsupports your machine type.BOOT_DISK_SIZE: the size of the boot disk in GB—for example,50.NETWORK_NAME_PREFIX: the name prefix to use for the VPC networks and subnets.
Spot
gcloud compute instances createVM_NAME \ --project=PROJECT_ID \ --zone=ZONE \ --machine-type=a3-megagpu-8g \ --maintenance-policy=TERMINATE \ --restart-on-failure \ --image-family=cos-121-lts \ --image-project=cos-cloud \ --boot-disk-size=BOOT_DISK_SIZE \ --metadata=cos-update-strategy=update_disabled \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-mgmt-net,subnet=NETWORK_NAME_PREFIX-mgmt-sub \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-1,subnet=NETWORK_NAME_PREFIX-data-sub-1,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-2,subnet=NETWORK_NAME_PREFIX-data-sub-2,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-3,subnet=NETWORK_NAME_PREFIX-data-sub-3,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-4,subnet=NETWORK_NAME_PREFIX-data-sub-4,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-5,subnet=NETWORK_NAME_PREFIX-data-sub-5,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-6,subnet=NETWORK_NAME_PREFIX-data-sub-6,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-7,subnet=NETWORK_NAME_PREFIX-data-sub-7,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-8,subnet=NETWORK_NAME_PREFIX-data-sub-8,no-address \ --provisioning-model=SPOT \ --instance-termination-action=TERMINATION_ACTION
Replace the following:
VM_NAME: the name of your VM instance.PROJECT_ID: the ID of your project.ZONE: a zone thatsupports your machine type.BOOT_DISK_SIZE: the size of the boot disk in GB—for example,50.NETWORK_NAME_PREFIX: the name prefix to use for the VPC networks and subnets.TERMINATION_ACTION: whether to stop or delete the VM on preemption.Specify one of the following values:- To stop the VM:
STOP - To delete the VM:
DELETE
- To stop the VM:
Flex-start
gcloud compute instances createVM_NAME \ --project=PROJECT_ID \ --zone=ZONE \ --machine-type=a3-megagpu-8g \ --maintenance-policy=TERMINATE \ --restart-on-failure \ --image-family=cos-121-lts \ --image-project=cos-cloud \ --boot-disk-size=BOOT_DISK_SIZE \ --metadata=cos-update-strategy=update_disabled \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-mgmt-net,subnet=NETWORK_NAME_PREFIX-mgmt-sub \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-1,subnet=NETWORK_NAME_PREFIX-data-sub-1,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-2,subnet=NETWORK_NAME_PREFIX-data-sub-2,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-3,subnet=NETWORK_NAME_PREFIX-data-sub-3,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-4,subnet=NETWORK_NAME_PREFIX-data-sub-4,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-5,subnet=NETWORK_NAME_PREFIX-data-sub-5,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-6,subnet=NETWORK_NAME_PREFIX-data-sub-6,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-7,subnet=NETWORK_NAME_PREFIX-data-sub-7,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-8,subnet=NETWORK_NAME_PREFIX-data-sub-8,no-address \ --provisioning-model=FLEX_START \ --instance-termination-action=TERMINATION_ACTION \ --max-run-duration=RUN_DURATION \ --request-valid-for-duration=VALID_FOR_DURATION \ --reservation-affinity=none
Replace the following:
VM_NAME: the name of your VM instance.PROJECT_ID: the ID of your project.ZONE: a zone thatsupports your machine type.BOOT_DISK_SIZE: the size of the boot disk in GB—for example,50.NETWORK_NAME_PREFIX: the name prefix to use for the VPC networks and subnets.TERMINATION_ACTION: whether to stop or delete the VM at the end of itsrun duration. Specify one of the following values:- To stop the VM:
STOP - To delete the VM:
DELETE
- To stop the VM:
RUN_DURATION: the maximum time that the VM runs before Compute Engine stops or deletes it. You must format the value as the number of days, hours, minutes, or seconds followed byd,h,m, andsrespectively. For example, a value of30mdefines a time of 30 minutes, and a value of1h2m3sdefines a time of one hour, two minutes, and three seconds. You can specify a value between 10 minutes and seven days.VALID_FOR_DURATION`: the maximum time to wait for provisioning your requested resources. You must format the value as the number of days, hours, minutes, or seconds followed byd,h,m, andsrespectively. Based on the zonal requirements for your workload, specify one of the following durations to help increase your chances that your VM creation request succeeds:- If your workload requires you to create the VM in a specific zone, then specify a duration between 90 seconds (
90s) and two hours (2h). Longer durations give you higher chances of obtaining resources. - If the VM can run in any zone within the region, then specify a duration of zero seconds (
0s). This value specifies that Compute Engine only allocates resources if they are immediately available. If the creation request fails because resources are unavailable, then retry the request in a different zone.
- If your workload requires you to create the VM in a specific zone, then specify a duration between 90 seconds (
Reservation-bound
gcloud compute instances createVM_NAME \ --project=PROJECT_ID \ --zone=ZONE \ --machine-type=a3-megagpu-8g \ --maintenance-policy=TERMINATE \ --restart-on-failure \ --image-family=cos-121-lts \ --image-project=cos-cloud \ --boot-disk-size=BOOT_DISK_SIZE \ --metadata=cos-update-strategy=update_disabled \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-mgmt-net,subnet=NETWORK_NAME_PREFIX-mgmt-sub \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-1,subnet=NETWORK_NAME_PREFIX-data-sub-1,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-2,subnet=NETWORK_NAME_PREFIX-data-sub-2,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-3,subnet=NETWORK_NAME_PREFIX-data-sub-3,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-4,subnet=NETWORK_NAME_PREFIX-data-sub-4,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-5,subnet=NETWORK_NAME_PREFIX-data-sub-5,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-6,subnet=NETWORK_NAME_PREFIX-data-sub-6,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-7,subnet=NETWORK_NAME_PREFIX-data-sub-7,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-8,subnet=NETWORK_NAME_PREFIX-data-sub-8,no-address \ --provisioning-model=RESERVATION_BOUND \ --instance-termination-action=TERMINATION_ACTION \ --reservation-affinity=specific \ --reservation=RESERVATION_URL
Replace the following:
VM_NAME: the name of your VM instance.PROJECT_ID: the ID of your project.ZONE: a zone thatsupports your machine type.BOOT_DISK_SIZE: the size of the boot disk in GB—for example,50.NETWORK_NAME_PREFIX: the name prefix to use for the VPC networks and subnets.TERMINATION_ACTION: whether to stop or delete the VM at the end of the reservation period. Specify one of the following values:- To stop the VM:
STOP - To delete the VM:
DELETE
- To stop the VM:
RESERVATION_URL: the URL of the reservation that you want to consume. Specify one of the following values:- If you created the reservation in the same project:
RESERVATION_NAME - If the reservation is in a different project, and your project can use it:
projects/PROJECT_ID/reservations/RESERVATION_NAME.
- If you created the reservation in the same project:
Install GPU drivers
On each A3 Mega VM, install the GPU drivers.
Install the NVIDIA GPU drivers.
sudo cos-extensions install gpu -- --version=latest
Remount the path.
sudo mount --bind /var/lib/nvidia /var/lib/nvidiasudo mount -o remount,exec /var/lib/nvidia
Give the NICs access to the GPUs
On each A3 Mega VM, give the NICs access to the GPUs.
- Adjust the firewall settings to accept all incoming TCP connections and enable communication between the nodes in your cluster:
sudo /sbin/iptables -I INPUT -p tcp -m tcp -j ACCEPT
- Configure the
dmabufmodule. Load theimport-helpermodule, which is part of thedmabufframework. This framework enables high-speed, zero-copy memory sharing between the GPU and the network interface card (NIC), a critical component for GPUDirect technology:sudo modprobe import-helper
- Configure Docker to authenticate requests to Artifact Registry.
docker-credential-gcr configure-docker --registries us-docker.pkg.dev
- Launch
RxDMin the container.RxDMis a management service that runs alongside the GPU application to manage GPU memory. This service pre-allocates and manages GPU memory for incoming network traffic, which is a key element of GPUDirect technology and essential for high-performance networking. Start a Docker container namedrxdm:docker run --pull=always --rm --detach --name rxdm \ --network=host --cap-add=NET_ADMIN \ --privileged \ --volume /var/lib/nvidia:/usr/local/nvidia \ --device /dev/nvidia0:/dev/nvidia0 \ --device /dev/nvidia1:/dev/nvidia1 \ --device /dev/nvidia2:/dev/nvidia2 \ --device /dev/nvidia3:/dev/nvidia3 \ --device /dev/nvidia4:/dev/nvidia4 \ --device /dev/nvidia5:/dev/nvidia5 \ --device /dev/nvidia6:/dev/nvidia6 \ --device /dev/nvidia7:/dev/nvidia7 \ --device /dev/nvidia-uvm:/dev/nvidia-uvm \ --device /dev/nvidiactl:/dev/nvidiactl \ --device /dev/dmabuf_import_helper:/dev/dmabuf_import_helper \ --env LD_LIBRARY_PATH=/usr/local/nvidia/lib64 \ us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/tcpgpudmarxd-dev:v1.0.21 \ --num_hops=2 --num_nics=8
To verify that
RxDMhas successfully started, run the command. Wait for the message "Buffer manager initialization complete" to confirm successfulRxDMinitialization.docker container logs --follow rxdm
Alternatively, check the
RxDMinitialization completion log.docker container logs rxdm 2>&1 | grep "Buffer manager initialization complete"
Set up NCCL environment
On each A3 Mega VM, complete the following steps:
- Install the
nccl-netlibrary, a plugin for NCCL that enablesGPUDirect communication over the network. The following command pulls the installer image and installsthe necessary library files into/var/lib/tcpxo/lib64/.NCCL_NET_IMAGE="us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/nccl-plugin-gpudirecttcpx-dev:v1.0.15"docker run --pull=always --rm --privileged \ --network=host --cap-add=NET_ADMIN \ --volume /var/lib/nvidia:/usr/local/nvidia \ --volume /var/lib:/var/lib \ --device /dev/nvidia0:/dev/nvidia0 \ --device /dev/nvidia1:/dev/nvidia1 \ --device /dev/nvidia2:/dev/nvidia2 \ --device /dev/nvidia3:/dev/nvidia3 \ --device /dev/nvidia4:/dev/nvidia4 \ --device /dev/nvidia5:/dev/nvidia5 \ --device /dev/nvidia6:/dev/nvidia6 \ --device /dev/nvidia7:/dev/nvidia7 \ --device /dev/nvidia-uvm:/dev/nvidia-uvm \ --device /dev/nvidiactl:/dev/nvidiactl \ --device /dev/dmabuf_import_helper:/dev/dmabuf_import_helper \ --env LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/var/lib/tcpxo/lib64 \ ${NCCL_NET_IMAGE} install --install-ncclsudo mount --bind /var/lib/tcpxo /var/lib/tcpxo && sudo mount -o remount,exec /var/lib/tcpxo - Launch a dedicated container
nccl-testsfor NCCL testing.This container comes pre-configured with the necessary tools and utility scripts, ensuring aclean and consistent environment for verifying GPUDirect setup performance.This command reuses the
NCCL_NET_IMAGEvariable that you set in the previous step.docker run --pull=always --rm --detach --name nccl \ --network=host --cap-add=NET_ADMIN \ --privileged \ --volume /var/lib/nvidia:/usr/local/nvidia \ --volume /var/lib/tcpxo:/var/lib/tcpxo \ --shm-size=8g \ --device /dev/nvidia0:/dev/nvidia0 \ --device /dev/nvidia1:/dev/nvidia1 \ --device /dev/nvidia2:/dev/nvidia2 \ --device /dev/nvidia3:/dev/nvidia3 \ --device /dev/nvidia4:/dev/nvidia4 \ --device /dev/nvidia5:/dev/nvidia5 \ --device /dev/nvidia6:/dev/nvidia6 \ --device /dev/nvidia7:/dev/nvidia7 \ --device /dev/nvidia-uvm:/dev/nvidia-uvm \ --device /dev/nvidiactl:/dev/nvidiactl \ --device /dev/dmabuf_import_helper:/dev/dmabuf_import_helper \ --env LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/var/lib/tcpxo/lib64 \ ${NCCL_NET_IMAGE} daemon
Run nccl-tests benchmark
To run the nccl-tests benchmark, on a single A3 Mega VM, complete the following steps:
- Open an interactive bash shell inside the
nccl-testscontainer.docker exec -it nccl bash
- Configure the environment for a multi-node run by setting up SSH and generating host files. Replace
VM_NAME_1andVM_NAME_2with the names of each VM./scripts/init_ssh.shVM_NAME_1VM_NAME_2/scripts/gen_hostfiles.shVM_NAME_1VM_NAME_2
This creates a directory named
/scripts/hostfiles2. - Run the
all_gather_perfbenchmark to measure collective communication performance:/scripts/run-nccl-tcpxo.sh all_gather_perf "${LD_LIBRARY_PATH}" 8 eth1,eth2,eth3,eth4,eth5,eth6,eth7,eth8 1M 512M 3 2 10 8 2 10
From inside thenccl-tests container's bash shell, complete the following steps.
Rocky
To test network performance with GPUDirect-TCPXO, create at least two A3 Mega VM instances. Create each VM by using therocky-linux-8-optimized-gcp-nvidia-580 or later Rocky Linux image and specifying the VPC networks that you created in the previous step.
A3 Mega VMs require nine Google Virtual NIC (gVNIC) network interfaces, one for the management network and eight for the data networks.
Based on theprovisioning model that you want to use to create your VM, select one of the following options:
Standard
gcloud compute instances createVM_NAME \ --project=PROJECT_ID \ --zone=ZONE \ --machine-type=a3-megagpu-8g \ --maintenance-policy=TERMINATE \ --restart-on-failure \ --image-family=rocky-linux-8-optimized-gcp-nvidia-580 \ --image-project=rocky-linux-accelerator-cloud \ --boot-disk-size=BOOT_DISK_SIZE \ --metadata=cos-update-strategy=update_disabled \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-mgmt-net,subnet=NETWORK_NAME_PREFIX-mgmt-sub \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-1,subnet=NETWORK_NAME_PREFIX-data-sub-1,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-2,subnet=NETWORK_NAME_PREFIX-data-sub-2,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-3,subnet=NETWORK_NAME_PREFIX-data-sub-3,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-4,subnet=NETWORK_NAME_PREFIX-data-sub-4,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-5,subnet=NETWORK_NAME_PREFIX-data-sub-5,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-6,subnet=NETWORK_NAME_PREFIX-data-sub-6,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-7,subnet=NETWORK_NAME_PREFIX-data-sub-7,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-8,subnet=NETWORK_NAME_PREFIX-data-sub-8,no-address
Replace the following:
VM_NAME: the name of your VM instance.PROJECT_ID: the ID of your project.ZONE: a zone thatsupports your machine type.BOOT_DISK_SIZE: the size of the boot disk in GB—for example,50.NETWORK_NAME_PREFIX: the name prefix to use for the VPC networks and subnets.
Spot
gcloud compute instances createVM_NAME \ --project=PROJECT_ID \ --zone=ZONE \ --machine-type=a3-megagpu-8g \ --maintenance-policy=TERMINATE \ --restart-on-failure \ --image-family=rocky-linux-8-optimized-gcp-nvidia-580 \ --image-project=rocky-linux-accelerator-cloud \ --boot-disk-size=BOOT_DISK_SIZE \ --metadata=cos-update-strategy=update_disabled \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-mgmt-net,subnet=NETWORK_NAME_PREFIX-mgmt-sub \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-1,subnet=NETWORK_NAME_PREFIX-data-sub-1,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-2,subnet=NETWORK_NAME_PREFIX-data-sub-2,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-3,subnet=NETWORK_NAME_PREFIX-data-sub-3,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-4,subnet=NETWORK_NAME_PREFIX-data-sub-4,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-5,subnet=NETWORK_NAME_PREFIX-data-sub-5,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-6,subnet=NETWORK_NAME_PREFIX-data-sub-6,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-7,subnet=NETWORK_NAME_PREFIX-data-sub-7,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-8,subnet=NETWORK_NAME_PREFIX-data-sub-8,no-address \ --provisioning-model=SPOT \ --instance-termination-action=TERMINATION_ACTION
Replace the following:
VM_NAME: the name of your VM instance.PROJECT_ID: the ID of your project.ZONE: a zone thatsupports your machine type.BOOT_DISK_SIZE: the size of the boot disk in GB—for example,50.NETWORK_NAME_PREFIX: the name prefix to use for the VPC networks and subnets.TERMINATION_ACTION: whether to stop or delete the VM on preemption.Specify one of the following values:- To stop the VM:
STOP - To delete the VM:
DELETE
- To stop the VM:
Flex-start
gcloud compute instances createVM_NAME \ --project=PROJECT_ID \ --zone=ZONE \ --machine-type=a3-megagpu-8g \ --maintenance-policy=TERMINATE \ --restart-on-failure \ --image-family=rocky-linux-8-optimized-gcp-nvidia-580 \ --image-project=rocky-linux-accelerator-cloud \ --boot-disk-size=BOOT_DISK_SIZE \ --metadata=cos-update-strategy=update_disabled \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-mgmt-net,subnet=NETWORK_NAME_PREFIX-mgmt-sub \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-1,subnet=NETWORK_NAME_PREFIX-data-sub-1,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-2,subnet=NETWORK_NAME_PREFIX-data-sub-2,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-3,subnet=NETWORK_NAME_PREFIX-data-sub-3,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-4,subnet=NETWORK_NAME_PREFIX-data-sub-4,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-5,subnet=NETWORK_NAME_PREFIX-data-sub-5,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-6,subnet=NETWORK_NAME_PREFIX-data-sub-6,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-7,subnet=NETWORK_NAME_PREFIX-data-sub-7,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-8,subnet=NETWORK_NAME_PREFIX-data-sub-8,no-address \ --provisioning-model=FLEX_START \ --instance-termination-action=TERMINATION_ACTION \ --max-run-duration=RUN_DURATION \ --request-valid-for-duration=VALID_FOR_DURATION \ --reservation-affinity=none
Replace the following:
VM_NAME: the name of your VM instance.PROJECT_ID: the ID of your project.ZONE: a zone thatsupports your machine type.BOOT_DISK_SIZE: the size of the boot disk in GB—for example,50.NETWORK_NAME_PREFIX: the name prefix to use for the VPC networks and subnets.TERMINATION_ACTION: whether to stop or delete the VM at the end of itsrun duration. Specify one of the following values:- To stop the VM:
STOP - To delete the VM:
DELETE
- To stop the VM:
RUN_DURATION: the maximum time that the VM runs before Compute Engine stops or deletes it. You must format the value as the number of days, hours, minutes, or seconds followed byd,h,m, andsrespectively. For example, a value of30mdefines a time of 30 minutes, and a value of1h2m3sdefines a time of one hour, two minutes, and three seconds. You can specify a value between 10 minutes and seven days.VALID_FOR_DURATION`: the maximum time to wait for provisioning your requested resources. You must format the value as the number of days, hours, minutes, or seconds followed byd,h,m, andsrespectively. Based on the zonal requirements for your workload, specify one of the following durations to help increase your chances that your VM creation request succeeds:- If your workload requires you to create the VM in a specific zone, then specify a duration between 90 seconds (
90s) and two hours (2h). Longer durations give you higher chances of obtaining resources. - If the VM can run in any zone within the region, then specify a duration of zero seconds (
0s). This value specifies that Compute Engine only allocates resources if they are immediately available. If the creation request fails because resources are unavailable, then retry the request in a different zone.
- If your workload requires you to create the VM in a specific zone, then specify a duration between 90 seconds (
Reservation-bound
gcloud compute instances createVM_NAME \ --project=PROJECT_ID \ --zone=ZONE \ --machine-type=a3-megagpu-8g \ --maintenance-policy=TERMINATE \ --restart-on-failure \ --image-family=rocky-linux-8-optimized-gcp-nvidia-580 \ --image-project=rocky-linux-accelerator-cloud \ --boot-disk-size=BOOT_DISK_SIZE \ --metadata=cos-update-strategy=update_disabled \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-mgmt-net,subnet=NETWORK_NAME_PREFIX-mgmt-sub \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-1,subnet=NETWORK_NAME_PREFIX-data-sub-1,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-2,subnet=NETWORK_NAME_PREFIX-data-sub-2,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-3,subnet=NETWORK_NAME_PREFIX-data-sub-3,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-4,subnet=NETWORK_NAME_PREFIX-data-sub-4,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-5,subnet=NETWORK_NAME_PREFIX-data-sub-5,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-6,subnet=NETWORK_NAME_PREFIX-data-sub-6,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-7,subnet=NETWORK_NAME_PREFIX-data-sub-7,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-8,subnet=NETWORK_NAME_PREFIX-data-sub-8,no-address \ --provisioning-model=RESERVATION_BOUND \ --instance-termination-action=TERMINATION_ACTION \ --reservation-affinity=specific \ --reservation=RESERVATION_URL
Replace the following:
VM_NAME: the name of your VM instance.PROJECT_ID: the ID of your project.ZONE: a zone thatsupports your machine type.BOOT_DISK_SIZE: the size of the boot disk in GB—for example,50.NETWORK_NAME_PREFIX: the name prefix to use for the VPC networks and subnets.TERMINATION_ACTION: whether to stop or delete the VM at the end of the reservation period. Specify one of the following values:- To stop the VM:
STOP - To delete the VM:
DELETE
- To stop the VM:
RESERVATION_URL: the URL of the reservation that you want to consume. Specify one of the following values:- If you created the reservation in the same project:
RESERVATION_NAME - If the reservation is in a different project, and your project can use it:
projects/PROJECT_ID/reservations/RESERVATION_NAME.
- If you created the reservation in the same project:
Give the NICs access to the GPUs
On each A3 Mega VM, give the NICs access to the GPUs.
- Install the prerequisite packages for the
dmabufmodule:sudo dnf -y install epel-release kernel-devel-$(uname -r) gitsudo dnf -y install dkms
- Install and load the
import-helpermodule, which is part of thedmabufframework. This framework enables high-speed, zero-copy memory sharing between the GPU and the network interface card (NIC), a critical component for GPUDirect technology:sudo git clone https://github.com/google/dmabuf_importer_helper.git /usr/src/import_helper-1.0sudo dkms install import_helper/1.0sudo modprobe import_helper
- Install and configure Docker to authenticate requests to Artifact Registry.
sudo dnf config-manager --add-repo https://download.docker.com/linux/rhel/docker-ce.reposudo dnf -y install docker-ce docker-ce-cli containerd.io docker-buildx-plugin docker-compose-pluginsudo systemctl --now enable dockersudo usermod -a -G docker $(whoami)curl -s -L https://nvidia.github.io/libnvidia-container/stable/rpm/nvidia-container-toolkit.repo | \ sudo tee /etc/yum.repos.d/nvidia-container-toolkit.repoexport NVIDIA_CONTAINER_TOOLKIT_VERSION=1.17.8-1sudo dnf install -y \ nvidia-container-toolkit-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ nvidia-container-toolkit-base-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container-tools-${NVIDIA_CONTAINER_TOOLKIT_VERSION} \ libnvidia-container1-${NVIDIA_CONTAINER_TOOLKIT_VERSION}sudo nvidia-ctk runtime configure --runtime=dockersudo systemctl restart dockergcloud auth configure-docker us-docker.pkg.devAfter running the preceding commands, you must log out and log back in for the changes to take effect.
- Launch
RxDMin the container.RxDMis a management service that runs alongside the GPU application to manage GPU memory. This service pre-allocates and manages GPU memory for incoming network traffic, which is a key element of GPUDirect technology and essential for high-performance networking. Start a Docker container namedrxdm:docker run --pull=always --rm --detach --name rxdm \ --network=host --cap-add=NET_ADMIN \ --gpus all \ --privileged \ --device /dev/dmabuf_import_helper:/dev/dmabuf_import_helper \ --env LD_LIBRARY_PATH=/usr/lib64 \ us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/tcpgpudmarxd-dev:v1.0.21 \ --num_hops=2 --nic_metric_directory=/tmp
To verify that
RxDMhas successfully started, run the following command. Wait for the message "Buffer manager initialization complete" to confirm successfulRxDMinitialization.docker container logs --follow rxdm
Alternatively, check the
RxDMinitialization completion log.docker container logs rxdm 2>&1 | grep "Buffer manager initialization complete"
Set up NCCL environment
On each A3 Mega VM, complete the following steps:
- Install the
nccl-netlibrary, a plugin for NCCL that enablesGPUDirect communication over the network. The following command pulls the installer image and installsthe necessary library files into/var/lib/tcpxo/lib64/.NCCL_NET_IMAGE="us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpxo/nccl-plugin-gpudirecttcpx-dev:v1.0.15"docker run --pull=always --rm --privileged \ --network=host --cap-add=NET_ADMIN \ --volume /var/lib:/var/lib \ --env LD_LIBRARY_PATH=/usr/lib64:/var/lib/tcpxo/lib64 \ ${NCCL_NET_IMAGE} install --install-ncclsudo mount --bind /var/lib/tcpxo /var/lib/tcpxo && sudo mount -o remount,exec /var/lib/tcpxo - Configure the aperture mount flags.
aperture_mount_flags="";while IFS= read -r line; do BDF=$( echo "$line" | awk '{print $1}' ); host_aperture_device="/sys/bus/pci/devices/$BDF" # replace '/sys/bus/pci/devices' with '/dev/aperture_devices' container_aperture_device="${host_aperture_device/\/sys\/bus\/pci\/devices/\/dev\/aperture_devices}"; aperture_mount_flags+="--mount type=bind,src=${host_aperture_device},target=${container_aperture_device} ";done<<(lspci -nn -D | grep '1ae0:0084') - Launch a dedicated container
nccl-testsfor NCCL testing.This container comes pre-configured with the necessary tools and utility scripts, ensuring aclean and consistent environment for verifying GPUDirect setup performance.This command reuses the
NCCL_NET_IMAGEvariable that you set in the previous step.docker run --pull=always --rm --detach --name nccl \ --network=host --cap-add=NET_ADMIN \ --privileged \ --volume /var/lib/:/var/lib \ --shm-size=8g \ --gpus all \ $aperture_mount_flags \ --device /dev/dmabuf_import_helper:/dev/dmabuf_import_helper \ --env LD_LIBRARY_PATH=/usr/lib64:/var/lib/tcpxo/lib64 \ ${NCCL_NET_IMAGE} daemon
Run nccl-tests benchmark
To run the nccl-tests benchmark, on a single A3 Mega VM, complete the following steps:
- Open an interactive bash shell inside the
nccl-testscontainer.docker exec -it nccl bash
- Configure the environment for a multi-node run by setting up SSH and generating host files. Replace
VM_NAME_1andVM_NAME_2with the names of each VM./scripts/init_ssh.shVM_NAME_1VM_NAME_2/scripts/gen_hostfiles.shVM_NAME_1VM_NAME_2
This creates a directory named
/scripts/hostfiles2. - Run the
all_gather_perfbenchmark to measure collective communication performance:/scripts/run-nccl-tcpxo.sh all_gather_perf "${LD_LIBRARY_PATH}" 8 enp134s0,enp135s0,enp13s0,enp14s0,enp141s0,enp142s0,enp6s0,enp7s0 1M 8G 3 2 10 8 2 10 enp0s12
From inside thenccl-tests container's bash shell, complete the following steps.
Create A3 High and Edge instances (GPUDirect-TCPX)
Create your A3 High and Edge instances by using thecos-121-ltsor later Container-Optimized OS image.
COS
To test network performance with GPUDirect-TCPX, you need to create at leasttwo A3 High or Edge VMs. Create each VM by using thecos-121-lts or laterContainer-Optimized OS image and specifying the VPC networks that youcreated in the previous step.
The VMs must use the Google Virtual NIC (gVNIC) network interface.For A3 High or Edge VMs, you must use gVNIC driver version 1.4.0rc3 or later. This driverversion is available on the Container-Optimized OS.The first virtual NIC is used as the primary NIC for generalnetworking and storage, the other four virtual NICs are NUMA aligned withtwo of the eight GPUs on the same PCIe switch.
Based on theprovisioning model thatyou want to use to create your VM, select one of the following options:
Standard
gcloud compute instances createVM_NAME \ --project=PROJECT_ID \ --zone=ZONE \ --machine-type=MACHINE_TYPE \ --maintenance-policy=TERMINATE --restart-on-failure \ --image-family=cos-121-lts \ --image-project=cos-cloud \ --boot-disk-size=BOOT_DISK_SIZE \ --metadata=cos-update-strategy=update_disabled \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-mgmt-net,subnet=NETWORK_NAME_PREFIX-mgmt-sub \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-1,subnet=NETWORK_NAME_PREFIX-data-sub-1,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-2,subnet=NETWORK_NAME_PREFIX-data-sub-2,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-3,subnet=NETWORK_NAME_PREFIX-data-sub-3,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-4,subnet=NETWORK_NAME_PREFIX-data-sub-4,no-address
Replace the following:
VM_NAME: the name of your VM.PROJECT_ID: the ID of your project.ZONE: a zone thatsupports your machine type.MACHINE_TYPE: the machine type for the VM. Specify eithera3-highgpu-8gora3-edgegpu-8g.BOOT_DISK_SIZE: the size of the boot disk in GB—for example,50.NETWORK_NAME_PREFIX: the name prefix to use for the VPCnetworks and subnets.
Spot
gcloud compute instances createVM_NAME \ --project=PROJECT_ID \ --zone=ZONE \ --machine-type=MACHINE_TYPE \ --maintenance-policy=TERMINATE --restart-on-failure \ --image-family=cos-121-lts \ --image-project=cos-cloud \ --boot-disk-size=BOOT_DISK_SIZE \ --metadata=cos-update-strategy=update_disabled \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-mgmt-net,subnet=NETWORK_NAME_PREFIX-mgmt-sub \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-1,subnet=NETWORK_NAME_PREFIX-data-sub-1,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-2,subnet=NETWORK_NAME_PREFIX-data-sub-2,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-3,subnet=NETWORK_NAME_PREFIX-data-sub-3,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-4,subnet=NETWORK_NAME_PREFIX-data-sub-4,no-address \ --provisioning-model=SPOT \ --instance-termination-action=TERMINATION_ACTION
Replace the following:
VM_NAME: the name of your VM.PROJECT_ID: the ID of your project.ZONE: a zone thatsupports your machine type.MACHINE_TYPE: the machine type for the VM. Specify eithera3-highgpu-8gora3-edgegpu-8g.BOOT_DISK_SIZE: the size of the boot disk in GB—for example,50.NETWORK_NAME_PREFIX: the name prefix to use for the VPCnetworks and subnets.TERMINATION_ACTION: whether to stop or delete the VM on preemption.Specify one of the following values:- To stop the VM:
STOP - To delete the VM:
DELETE
- To stop the VM:
Flex-start
gcloud compute instances createVM_NAME \ --project=PROJECT_ID \ --zone=ZONE \ --machine-type=MACHINE_TYPE \ --maintenance-policy=TERMINATE --restart-on-failure \ --image-family=cos-121-lts \ --image-project=cos-cloud \ --boot-disk-size=BOOT_DISK_SIZE \ --metadata=cos-update-strategy=update_disabled \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-mgmt-net,subnet=NETWORK_NAME_PREFIX-mgmt-sub \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-1,subnet=NETWORK_NAME_PREFIX-data-sub-1,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-2,subnet=NETWORK_NAME_PREFIX-data-sub-2,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-3,subnet=NETWORK_NAME_PREFIX-data-sub-3,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-4,subnet=NETWORK_NAME_PREFIX-data-sub-4,no-address \ --provisioning-model=FLEX_START \ --instance-termination-action=TERMINATION_ACTION \ --max-run-duration=RUN_DURATION \ --request-valid-for-duration=VALID_FOR_DURATION \ --reservation-affinity=none
Replace the following:
VM_NAME: the name of your VM.PROJECT_ID: the ID of your project.ZONE: a zone thatsupports your machine type.MACHINE_TYPE: the machine type for the VM. Specify eithera3-highgpu-8gora3-edgegpu-8g.BOOT_DISK_SIZE: the size of the boot disk in GB—for example,50.NETWORK_NAME_PREFIX: the name prefix to use for the VPCnetworks and subnets.TERMINATION_ACTION: whether to stop or delete the VM at the end of itsrun duration. Specify one of the following values:- To stop the VM:
STOP - To delete the VM:
DELETE
- To stop the VM:
RUN_DURATION: the maximum time that the VM runs before Compute Engine stops or deletes it. You must format the value as the number of days, hours, minutes, or seconds followed byd,h,m, andsrespectively. For example, a value of30mdefines a time of 30 minutes, and a value of1h2m3sdefines a time of one hour, two minutes, and three seconds. You can specify a value between 10 minutes and seven days.VALID_FOR_DURATION`: the maximum time to wait for provisioning your requested resources. You must format the value as the number of days, hours, minutes, or seconds followed byd,h,m, andsrespectively. Based on the zonal requirements for your workload, specify one of the following durations to help increase your chances that your VM creation request succeeds:- If your workload requires you to create the VM in a specific zone, then specify a duration between 90 seconds (
90s) and two hours (2h). Longer durations give you higher chances of obtaining resources. - If the VM can run in any zone within the region, then specify a duration of zero seconds (
0s). This value specifies that Compute Engine only allocates resources if they are immediately available. If the creation request fails because resources are unavailable, then retry the request in a different zone.
- If your workload requires you to create the VM in a specific zone, then specify a duration between 90 seconds (
Reservation-bound
Important: The reservation-bound provisioning model doesn't support thea3-edgegpu-8g machine type. To create a VM that uses ana3-edgegpu-8g machine type, use a different provisioning model.gcloud compute instances createVM_NAME \ --project=PROJECT_ID \ --zone=ZONE \ --machine-type=MACHINE_TYPE \ --maintenance-policy=TERMINATE --restart-on-failure \ --image-family=cos-121-lts \ --image-project=cos-cloud \ --boot-disk-size=BOOT_DISK_SIZE \ --metadata=cos-update-strategy=update_disabled \ --scopes=https://www.googleapis.com/auth/cloud-platform \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-mgmt-net,subnet=NETWORK_NAME_PREFIX-mgmt-sub \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-1,subnet=NETWORK_NAME_PREFIX-data-sub-1,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-2,subnet=NETWORK_NAME_PREFIX-data-sub-2,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-3,subnet=NETWORK_NAME_PREFIX-data-sub-3,no-address \ --network-interface=nic-type=GVNIC,network=NETWORK_NAME_PREFIX-data-net-4,subnet=NETWORK_NAME_PREFIX-data-sub-4,no-address \ --provisioning-model=RESERVATION_BOUND \ --instance-termination-action=TERMINATION_ACTION \ --reservation-affinity=specific \ --reservation=RESERVATION_URL
Replace the following:
VM_NAME: the name of your VM.PROJECT_ID: the ID of your project.ZONE: a zone thatsupports your machine type.MACHINE_TYPE: the machine type for the VM. Specify eithera3-highgpu-8gora3-edgegpu-8g.BOOT_DISK_SIZE: the size of the boot disk in GB—for example,50.NETWORK_NAME_PREFIX: the name prefix to use for the VPCnetworks and subnets.TERMINATION_ACTION: whether to stop or delete the VM at the end of the reservation period. Specify one of the following values:- To stop the VM:
STOP - To delete the VM:
DELETE
- To stop the VM:
RESERVATION_URL: the URL of the reservation that you want to consume. Specify one of the following values:- If you created the reservation in the same project:
example-reservation - If the reservation is in a different project, and your project can use it:
projects/PROJECT_ID/reservations/example-reservation.
- If you created the reservation in the same project:
Install GPU drivers
On each A3 High or Edge VM, complete the following steps.
- Install the NVIDIA GPU drivers by running the following command:
sudo cos-extensions install gpu -- --version=latest
- Re-mount the path by running the following command:
sudo mount --bind /var/lib/nvidia /var/lib/nvidiasudo mount -o remount,exec /var/lib/nvidia
Give the NICs access to the GPUs
On each A3 High or Edge VM, give the NICs access to the GPUs by completing the followingsteps:
- Configure the registry.
- If you are using Container Registry, run the following command:
docker-credential-gcr configure-docker
- If you are using Artifact Registry, run the following command:
docker-credential-gcr configure-docker --registries us-docker.pkg.dev
- If you are using Container Registry, run the following command:
- Configure the receive data path manager. A management service, GPUDirect-TCPX Receive Data Path Manager, needs to run alongside the applications that use GPUDirect-TCPX. To start the service on each Container-Optimized OS VM, run the following command:
docker run --pull=always --rm \ --name receive-datapath-manager \ --detach \ --privileged \ --cap-add=NET_ADMIN --network=host \ --volume /var/lib/nvidia/lib64:/usr/local/nvidia/lib64 \ --device /dev/nvidia0:/dev/nvidia0 \ --device /dev/nvidia1:/dev/nvidia1 \ --device /dev/nvidia2:/dev/nvidia2 \ --device /dev/nvidia3:/dev/nvidia3 \ --device /dev/nvidia4:/dev/nvidia4 \ --device /dev/nvidia5:/dev/nvidia5 \ --device /dev/nvidia6:/dev/nvidia6 \ --device /dev/nvidia7:/dev/nvidia7 \ --device /dev/nvidia-uvm:/dev/nvidia-uvm \ --device /dev/nvidiactl:/dev/nvidiactl \ --env LD_LIBRARY_PATH=/usr/local/nvidia/lib64 \ --volume /run/tcpx:/run/tcpx \ --entrypoint /tcpgpudmarxd/build/app/tcpgpudmarxd \ us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpx/tcpgpudmarxd \ --gpu_nic_preset a3vm --gpu_shmem_type fd --uds_path "/run/tcpx" --setup_param "--verbose 128 2 0"
- Verify the
receive-datapath-managercontainer started.docker container logs --follow receive-datapath-manager
The output should resemble the following:
I0000 00:00:1687813309.406064 1 rx_rule_manager.cc:174] Rx Rule Manager server(s) started...
- To stop viewing the logs, press
ctrl-c. - Install IP table rules.
sudo iptables -I INPUT -p tcp -m tcp -j ACCEPT
- Configure the NVIDIA Collective Communications Library (NCCL) and GPUDirect-TCPX plugin.
A specific NCCL library version and GPUDirect-TCPX plugin binary combination are required to use NCCL with GPUDirect-TCPX support. Google Cloud has provided packages that meet this requirement.
To install the Google Cloud package, run the following command:
docker run --rm -v /var/lib:/var/lib us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpx/nccl-plugin-gpudirecttcpx install --install-ncclsudo mount --bind /var/lib/tcpx /var/lib/tcpxsudo mount -o remount,exec /var/lib/tcpx
If this command is successful, the
libnccl-net.soandlibnccl.sofiles are placed in the/var/lib/tcpx/lib64directory.
Run tests
On each A3 High or Edge VM, run an NCCL test by completing the following steps:
- Start the container.
#!/bin/bashfunction run_tcpx_container() {docker run \ -u 0 --network=host \ --cap-add=IPC_LOCK \ --userns=host \ --volume /run/tcpx:/tmp \ --volume /var/lib/nvidia/lib64:/usr/local/nvidia/lib64 \ --volume /var/lib/tcpx/lib64:/usr/local/tcpx/lib64 \ --shm-size=1g --ulimit memlock=-1 --ulimit stack=67108864 \ --device /dev/nvidia0:/dev/nvidia0 \ --device /dev/nvidia1:/dev/nvidia1 \ --device /dev/nvidia2:/dev/nvidia2 \ --device /dev/nvidia3:/dev/nvidia3 \ --device /dev/nvidia4:/dev/nvidia4 \ --device /dev/nvidia5:/dev/nvidia5 \ --device /dev/nvidia6:/dev/nvidia6 \ --device /dev/nvidia7:/dev/nvidia7 \ --device /dev/nvidia-uvm:/dev/nvidia-uvm \ --device /dev/nvidiactl:/dev/nvidiactl \ --env LD_LIBRARY_PATH=/usr/local/nvidia/lib64:/usr/local/tcpx/lib64 \ "$@"}The preceding command completes the following:
- Mounts NVIDIA devices from
/devinto the container - Sets network namespace of the container to the host
- Sets user namespace of the container to host
- Adds
CAP_IPC_LOCKto the capabilities of the container - Mounts
/tmpof the host to/tmpof the container - Mounts the installation path of NCCL and GPUDirect-TCPX NCCL plugin intothe container and add the mounted path to
LD_LIBRARY_PATH
- Mounts NVIDIA devices from
- After you start the container, applications that use NCCL can run from inside the container. For example, to run the
run-allgathertest, complete the following steps:- On each A3 High or Edge VM, run the following:
$ run_tcpx_container -it --rm us-docker.pkg.dev/gce-ai-infra/gpudirect-tcpx/nccl-plugin-gpudirecttcpx shell
- On one VM, run the following commands:
- Set up connection between the VMs. Replace
VM-0andVM-1with the names of each VM./scripts/init_ssh.shVM-0VM-1pushd /scripts && /scripts/gen_hostfiles.shVM-0VM-1; popd
This creates a
/scripts/hostfiles2directory on each VM. - Run the script.
/scripts/run-allgather.sh 8 eth1,eth2,eth3,eth4 1M 512M 2
- Set up connection between the VMs. Replace
The
run-allgatherscript takes about two minutes to run. Atthe end of the logs, you'll see theall-gatherresults.If you see the following line in your NCCL logs, this verifies thatGPUDirect-TCPX is initialized successfully.
NCCL INFO NET/GPUDirectTCPX ver. 3.1.1.
- On each A3 High or Edge VM, run the following:
Multi-Instance GPU
AMulti-Instance GPUpartitions a single NVIDIA H100 GPU within the same VM into as many asseven independent GPU instances. They run simultaneously, each with its own memory,cache and streaming multiprocessors. This setup enables the NVIDIA H100GPU to deliver consistent quality-of-service (QoS) at up to 7x higherutilization compared to earlier GPU models.
You can create up to seven Multi-instance GPUs. With the H100 80GBGPUs each Multi-instance GPU is allocated 10 GB of memory.
For more information about using Multi-Instance GPUs, seeNVIDIA Multi-Instance GPU User Guide.
To create Multi-Instance GPUs, complete the following steps:
Create your A3 Mega, A3 High, or A3 Edge instances.
Install the GPU drivers.
Enable MIG mode. For instructions, seeEnable MIG.
Configure your GPU partitions. For instructions, seeWork with GPU partitions.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-12 UTC.