Use higher network bandwidth

This page explains how to create A2, G2 and N1 instances that use higher networkbandwidths. To learn how to use higher network bandwidths for otheraccelerator-optimized machine series, seeCreate high bandwidth GPU machines.

You can use higher network bandwidths, of 100 Gbps or more, to improve theperformance of distributed workloads running on your GPU VMs.Higher network bandwidths are available for A2, G2 and N1 VMs with attachedGPUs on Compute Engine as follows:

For N1 general-purpose VMs that have T4 and V100 GPUs attached, you can get amaximum network bandwidth of up to 100 Gbps, based on the combination ofGPU and vCPU count.
For A2 and G2 accelerator-optimized VMs, you can get amaximum network bandwidth of up to 100 Gbps, based on the machine type.

To review the configurations or machine types that support these higher networkbandwidths rates, seeNetwork bandwidths and GPUs.

For general network bandwidth information on Compute Engine, seeNetwork bandwidth.

Overview

To use the higher network bandwidths available to each GPU VM, complete thefollowing recommended steps:

Create your GPU VMby using an OS image that supports Google Virtual NIC (gVNIC).
Optional:Install Fast Socket.Fast Socket improves NCCL performance on100 Gbps or higher networks by reducing the contention between multiple TCPconnections. Some Deep Learning VM Images (DLVM) have Fast Socketpreinstalled.

Use Deep Learning VM Images

You can create your VMs using any GPU supported image from theDeep Learning VM Images project. All GPU supported DLVM imageshave the GPU driver, ML software, and gVNIC preinstalled. For a list of DLVMimages, seeChoosing an image.

If you want to use Fast Socket, you can choose a DLVM image such as:tf-latest-gpu-debian-10 ortf-latest-gpu-ubuntu-1804.

Caution: You can't use Deep Learning VM Imageson boot disks for your VMs that use G2 machine types. G2 machinetypes are accelerator-optimized machine series that have NVIDIA L4 GPUs attached.

Create VMs that use higher network bandwidths

For higher network bandwidths, it is recommended that you enableGoogle Virtual NIC (gVNIC). For more information,seeUsing Google Virtual NIC.

To create a VM that has attached GPUs and a higher network bandwidth, completethe following:

Review themaximum network bandwidth availablefor each machine type that has attached GPUs.
Create your GPU VM. The following examples show how to create A2and N1 with attached V100 VMs.
In these examples, VMs are created by using the Google Cloud CLI. However,you can also use either theGoogle Cloud console or theCompute Engine API to create these VMs.For more information about creating GPU VMs, seeCreate a VM with attached GPUs.
A2 (A100)
For example, to create a VM that has a maximum bandwidth of 100 Gbps, haseight A100 GPUs attached, and uses thetf-latest-gpu DLVM image, run thefollowing command:
```
gcloud compute instances createVM_NAME \ --project=PROJECT_ID \ --zone=ZONE \ --machine-type=a2-highgpu-8g \ --maintenance-policy=TERMINATE --restart-on-failure \ --image-family=tf-latest-gpu \ --image-project=deeplearning-platform-release \ --boot-disk-size=200GB \ --network-interface=nic-type=GVNIC \ --metadata="install-nvidia-driver=True,proxy-mode=project_editors" \ --scopes=https://www.googleapis.com/auth/cloud-platform
```
Replace the following:
- VM_NAME: the name of your VM
- PROJECT_ID : your project ID
- ZONE: the zone for the VM. This zone must supportthe specified GPU type. For more information about zones,seeGPU regions and zones availability.
N1 (V100)
For example, to create a VM that has a maximum bandwidth of 100 Gbps,has eight V100 GPUs attached, and uses thetf-latest-gpu DLVM image, run thefollowing command:
```
gcloud compute instances createVM_NAME \ --projectPROJECT_ID \ --custom-cpu 96 \ --custom-memory 624 \ --image-project=deeplearning-platform-release \ --image-family=tf-latest-gpu \ --accelerator type=nvidia-tesla-v100,count=8 \ --maintenance-policy TERMINATE \ --metadata="install-nvidia-driver=True"  \ --boot-disk-size 200GB \ --network-interface=nic-type=GVNIC \ --zone=ZONE
```
If you are not using GPU supported Deep Learning VM Images orContainer-Optimized OS, install GPU drivers. For moreinformation, seeInstalling GPU drivers.
Optional: On the VM,Install Fast Socket.
After you setup the VM, you canverify the network bandwidth.
Note: To achieve the higher network bandwidth rates, your applications mustuse multiple network streams.

Install Fast Socket

NVIDIA Collective Communications Library (NCCL) is used by deep learningframeworks such as TensorFlow, PyTorch, Horovod for multi-GPUand multi-node training.

Fast Socket is a Google proprietary network transport for NCCL. OnCompute Engine, Fast Socket improves NCCL performance on 100 Gbpsnetworks by reducing the contention between multiple TCP connections.For more information about working with NCCL, see the NCCL user guide.

Current evaluation shows that Fast Socket improves all-reduce throughputby 30%–60%, depending on the message size.

To setup a Fast Socket environment, you can use either aDeep Learning VM Images that has Fast Socket preinstalled, or you canmanually install Fast Socket on a Linux VM. To check if Fast Socket ispreinstalled, seeVerifying that Fast Socket is enabled.

Note: Fast Socket is not supported on Windows VMs.

Before you install Fast Socket on a Linux VM, you need to install NCCL.For detailed instructions, see NVIDIA NCCL documentation.

CentOS/RHEL

To download and install Fast Socket on a CentOS or RHEL VM, complete the followingsteps:

Add the package repository and import public keys.

sudo tee /etc/yum.repos.d/google-fast-socket.repo<< EOM[google-fast-socket]name=Fast Socket Transport for NCCLbaseurl=https://packages.cloud.google.com/yum/repos/google-fast-socketenabled=1gpgcheck=0repo_gpgcheck=0gpgkey=https://packages.cloud.google.com/yum/doc/yum-key.gpg      https://packages.cloud.google.com/yum/doc/rpm-package-key.gpgEOM

Install Fast Socket.
```
sudo yum install google-fast-socket
```
Verify that Fast Socket isenabled.

SLES

To download and install Fast Socket on an SLES VM, complete the followingsteps:

Add the package repository.

sudo zypper addrepo https://packages.cloud.google.com/yum/repos/google-fast-socket google-fast-socket

Add repository keys.

sudo rpm --import https://packages.cloud.google.com/yum/doc/yum-key.gpg https://packages.cloud.google.com/yum/doc/rpm-package-key.gpg

Install Fast Socket.
```
sudo zypper install google-fast-socket
```
Verify that Fast Socket isenabled.

Debian/Ubuntu

To download and install Fast Socket on a Debian or Ubuntu VM, complete the followingsteps:

Add the package repository.

echo "deb https://packages.cloud.google.com/apt google-fast-socket main" | sudo tee /etc/apt/sources.list.d/google-fast-socket.list

Add repository keys.

curl https://packages.cloud.google.com/apt/doc/apt-key.gpg | sudo apt-key add -

Install Fast Socket.

sudo apt update && sudo apt install google-fast-socket

Verify that Fast Socket isenabled.

Verifying that Fast Socket is enabled

On your VM, complete the following steps:

Locate the NCCL home directory.

sudo ldconfig -p | grep nccl

For example, on a DLVM image, you get the following output:

libnccl.so.2 (libc6,x86-64) => /usr/local/nccl2/lib/libnccl.so.2libnccl.so (libc6,x86-64) => /usr/local/nccl2/lib/libnccl.solibnccl-net.so (libc6,x86-64) => /usr/local/nccl2/lib/libnccl-net.so

This shows that the NCCL home directory is/usr/local/nccl2.

Check that NCCL loads the Fast Socket plugin. To check, you need todownload the NCCL test package. To download the test package, run the followingcommand:
```
git clone https://github.com/NVIDIA/nccl-tests.git && \cd nccl-tests && make NCCL_HOME=NCCL_HOME_DIRECTORY
```
ReplaceNCCL_HOME_DIRECTORY with the NCCL home directory.

From thenccl-tests directory, run theall_reduce_perf process:

NCCL_DEBUG=INFO build/all_reduce_perf

If Fast Socket is enabled, theFastSocket plugin initialized messagedisplays in the output log.

# nThread 1 nGpus 1 minBytes 33554432 maxBytes 33554432 step: 1048576(bytes) warmup iters: 5 iters: 20 validation: 1## Using devices#   Rank  0 Pid  63324 on fast-socket-gpu device  0 [0x00] Tesla V100-SXM2-16GB.....fast-socket-gpu:63324:63324 [0] NCCL INFO NET/FastSocket : Flow placement enabled.fast-socket-gpu:63324:63324 [0] NCCL INFO NET/FastSocket : queue skip: 0fast-socket-gpu:63324:63324 [0] NCCL INFO NET/FastSocket : Using [0]ens12:10.240.0.24fast-socket-gpu:63324:63324 [0] NCCL INFO NET/FastSocket plugin initialized......

Check network bandwidth

This section explains how to check network bandwidth for A3 Mega, A3 High,A3 Edge, A2, G2 and N1 instances. When working with high bandwidth GPUs, you canuse a network traffic tool, such as iperf2, to measure the networking bandwidth.

To check bandwidth speeds, you need at least two VMs that have attachedGPUs and can both support the bandwidth speed that you are testing.

Use iPerf to perform the benchmark on Debian-based systems.

Note: Ensure that you are using iPerf version 2 and not version 3; iPerfversion 3 does not support multi-threading (by design) and can have performanceimplications in your results when running multiple streams.

Create two VMs that can support the required bandwidth speeds.

Once both VMs are running, use SSH to connect to one of the VMs.

gcloud compute sshVM_NAME \    --project=PROJECT_ID

Replace the following:

VM_NAME: the name of the first VM
PROJECT_ID: your project ID

On the first VM, complete the following steps:

Installiperf.

sudo apt-get update && sudo apt-get install iperf

Get the internal IP address for this VM. Keep track of it by writing it down.
```
ip a
```
Start up the iPerf server.
```
iperf -s
```
This starts up a server listening for connections in order to perform thebenchmark. Leave this running for the duration of the test.

From a new client terminal, connect to the second VM using SSH.

gcloud compute sshVM_NAME \   --project=PROJECT_ID

Replace the following:

VM_NAME: the name of the second VM
PROJECT_ID: your project ID

On the second VM, complete the following steps:
1. Run the iperf test and specify the first VM's IP address as the target.
  Note: The order of the arguments is important.
```
iperf -t 30 -c internal_ip_of_instance_1 -P 16
```
  This executes a 30-second test and produces a result that resemblesthe following output. If iPerf is not able to reach the other VM you,might need to adjust the network or firewall settings on the VMs or perhapsin the Google Cloud console.

When you use the maximum available bandwidth of 100 Gbps or 1000 Gbps (A3 Mega,A3 High, or A3 Edge), keep the following considerations in mind:

Due to packet header overheads for network protocols such as Ethernet, IP,and TCP on the virtualization stack, the throughput, as measured bynetperf,saturates at around 90 Gbps or 800 Gbps (A3 Mega, A3 High, or A3 Edge).Generally known asgoodput.
TCP is able to achieve the 100 or 1000 Gbps network speed. Other protocols, suchas UDP, are slower.
Due to factors such as protocol overhead and network congestion, end-to-endperformance of data streams might be slightly lower.
You need to use multiple TCP streams to achieve maximum bandwidthbetween VM instances. Google recommends 4 to 16 streams. At 16 flows you'llfrequently maximize the throughput. Depending on your application andsoftware stack, you might need to adjust settings for your application or yourcode to set up multiple streams.

What's next?

To monitor GPU performance, seeMonitoring GPU performance.
To handle GPU host maintenance, seeHandling GPU host maintenance events.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換

Use higher network bandwidth Stay organized with collections Save and categorize content based on your preferences.