Patterns for using multiple host NICs (GPU VMs)

Some accelerator-optimized machines, including A4X Max, A4X, A4, and A3 Ultra,have two network interfaces that can attach to regular Virtual Private Cloud (VPC)networks. These machines also haveMRDMA network interfaces that must attachto an RoCE VPC network.On the host, these areTitanium IPUs whichare plugged into separate CPU sockets and non-uniform memory access (NUMA)nodes. These IPUs areavailable inside theVM as Google Virtual NICs(gVNICs), and provide network bandwidth for storage activities such ascheckpointing, loading training data, loading models, and other generalnetworking needs. The machine's NUMA topology, including that of the gVNICs, isvisible to the guest operating system (OS).

This document describes best practices for using the two gVNICs on thesemachines.

Overview

In general, we recommend that you use the following configurations, regardlessof how you plan to use multiple host NICs:

Network settings:Each gVNIC network interface must use a uniqueVPC network. For a VPCset up, consider the following:
- Use a largemaximum transmission unit (MTU) for each VPCnetwork. We recommend that you use 8896, which is the maximum supported MTU.The ingress performance for some workloads might be slowed down due tothe system dropping incoming data packets on the receiver side. You canuse the ethtool tool to check for this issue. In this scenario, it canbe helpful to adjust the TCP MSS, interface MTU, or VPC MTU to allow forefficient data allocation from the page cache, which allows the incominglayer 2 frame to fit within two 4KB buffers.
Application settings
- NUMA-align the application. Use CPU cores, memory allocations, and anetwork interface from the same NUMA node. If you are running adedicated instance of the application to use a specific NUMA node ornetwork interface, you can use tools likenumactl to attach theapplication's CPU and memory resources to a specific NUMA node.
Operating system settings
- Enable TCP segmentation offload (TSO) and large receive offload (LRO).
- For each gVNIC interface, ensure that the SMP affinity is set up so thatits interrupt requests (IRQs) are handled on the same NUMA node as theinterface, and spread interrupts out across cores. If you're running aGoogle-supplied guest OS image, this process happens automatically usingthegoogle_set_multiqueue script.
- Evaluate settings likeRFS, RPS, andXPS tosee if they may be helpful for your workload.
- For A4X and A4X Max, Nvidia recommendsdisabling automatic NUMAscheduling.
- Linux kernel bonding is not supported for the gVNICs on thesemachines.

Patterns for using multiple host NICs

The section outlines general patterns for using multiple host NICs onGoogle Cloud.

Pattern	Supported process layout	Deployment path: Compute Engine	Deployment path: GKE	Deployment path: Slurm	Notes
Change application to use aspecific interface	Process shard per interface	Yes	Yes	Yes	Requires code changes to the application
Change application to use bothinterfaces	Dual-interface process	Yes	Yes	Yes	Requires code changes to the application
Use a dedicated network namespace forspecific applications	Process shard per interface	Yes	Yes (privileged containers only)	No
Map an entire container's traffic to a singleinterface	All container traffic mapped to one interface	Yes	Yes	No
Peer the VPCs and let the systemload-balance sessions across interfaces	Dual-interface process	Yes*	Yes*	Yes*	Challenging or impossible to NUMA-align Need Linux Kernel 6.16 or later*
Shard traffic acrossnetworks	Dual-interface process Process shard per interface	Yes*	Yes*	Yes*	Might require code changes to NUMA-align if running a dual-interfaceprocess.
Use SNAT to choose the source interface	Dual-interface process Process shard per interface	Yes	Yes (setup requires administrator privileges)	Yes (setup requires administrator privileges)	Can be more challenging to configure correctly

* This option is not generally recommended but might be useful for limitedworkloads on x86 (A3 Ultra and A4) platforms.

Change application to use a specific interface

Requirements:

This method requires code changes to your application.
Requires permissions for one or more of the following methods:
- bind() only requires special permissions if using a privileged sourceport.
- SO_BINDTODEVICE: requiresCAP_NET_RAW permission.
This method can require you to modify your kernel routing table to establishroutes and to prevent asymmetric routing.

High-level overview

Note: Some applications and libraries are already capable of and optimized formulti-NUMA or multi-socket scenarios, including multiple network interfaces. Ifyou're using third-party or open source software, be sure to check thedocumentation for that software.

With this pattern, you complete the following:

Add network interface binding to your application's source code by using oneof the following options:

Usebind()to bind a socket to a particular source IP address
Use theSO_BINDTODEVICE socket option to bind a socket to a particularnetwork interface

Modify the kernel routing table as needed to ensure a route exists from thesource network interface to the destination address. In addition, routesmight be required to prevent asymmetric routing. We recommend that you configurepolicy routing as described in Configure routing for an additional networkinterface.
You can also use thenumactl command to run your application. In this approach,you use the memory and CPUs that are on the same NUMA node as your chosennetwork interface.

After you complete the preceding steps, instances of your application run usinga specific network interface.

Change application to use both interfaces

Requirements:

This method requires code changes to your application.
You require permissions for one or more of the following methods:
- bind() only requires special permissions if using a privileged sourceport.
- SO_BINDTODEVICE: requiresCAP_NET_RAW permission.
This method can require you to modify your kernel routing table to establishroutes and to prevent asymmetric routing.

High-level overview

Note: Some applications and libraries are already capable of and optimized formulti-NUMA or multi-socket scenarios along with multiple network interfaces. Ifusing third-party or open source software, be sure to check the documentationfor that software.

To implement this pattern, you do the following:

Add network interface binding to your application's source code by using oneof the following options:

Use thebind() system call to bind a socket to a particular source IPaddress
Use theSO_BINDTODEVICE socket option to bind a socket to a particularnetwork interface

If your application is acting as the client, you will need to create aseparate client socket for each source network interface.

Modify the kernel routing table as needed to ensure a route exists from thesource network interface to the destination address. In addition, you mightalso require routes to prevent asymmetric routing. We recommend that youconfigure policy routing as described in Configure routing for anadditional networkinterface.
We recommend that you partition network activity into threads that run onthe same NUMA node as the gVNIC interface. One common way of requesting aspecific NUMA node for a thread is to callpthread_setaffinity_np.
1. Since the application utilizes resources on multiple NUMA nodes,avoid usingnumactl or ensure yournumactl command includes the NUMAnodes of all network interfaces used by your application.

Use a dedicated network namespace for specific applications

Requirements:

RequiresCAP_SYS_ADMIN capability.
Not compatible with GKE autopilot.
If using GKE, you must have a privileged container.

This section describes patterns that you can use to create a network namespacethat uses a secondary network interface. The right pattern for your workloaddepends on your specific scenario. The approaches that use virtual switch orIPvlan are better suited to cases where multiple applications need to use thesecondary interface from different network namespaces.

High-level overview: moving secondary interface into dedicated networknamespace

This pattern involves creating a network namespace, moving the secondary gVNICinterface into the new namespace, and then running the application from thisnamespace. This pattern might be less complicated to set up and tune compared tousing a virtual switch. However, applications outside of the new networknamespace will be unable to access the secondary gVNIC.

The following example shows a series of commands that can be used to move eth1into the new network namespace called second.

ip netns add secondip link set eth1 netns secondip netns exec second ip addr add ${ETH1_IP}/${PREFIX} dev eth1ip netns exec second ip link set dev eth1 upip netns exec second ip route add default via ${GATEWAY_IP} dev eth1ip netns exec second <command>

When this command is run, the <command> expression is then executed insidethe network namespace, and uses the eth1 interface.

Applications running inside the new network namespace now use the secondarygVNIC. You can also use thenumactl command to run your application using thememory and CPUs that are on the same NUMA node as your chosen network interface.

High-level overview: using a virtual switch and network namespace for asecondary interface This pattern involves creating a virtual switch setup touse the secondary gVNIC from a network namespace.

The high-level steps are as follows:

Create a Virtual Ethernet (veth) device pair. Adjust the maximumtransmission unit (MTU) on each of the devices to match the MTU of thesecondary gVNIC.
Run the following to ensure that IP forwarding is enabled for IPv4: sysctl-w net.ipv4.ip_forward=1
Move one end of the veth pair into a new network namespace, and leave theother end in the root namespace.
Map traffic from the veth device to the secondary gVNIC interface. There areseveral ways to do this, however, we recommend that youcreate an IP aliasrange for the VM's secondary interface and assign an IPaddress from this range to the child interface in the namespace.
Run the application from the new network namespace. You can use thenumactl command to run your application using memory and CPUs that are onthe same NUMA node as the chosen network interface.

Depending on the guest and workload setup, alternatively, you can use theIPvlan driverwith an IPvlan interface linked to the secondary gVNIC instead of creating theveth devices.

Map an entire container's traffic to a single interface

Requirements:

Your application must run inside a container that uses a network namespacefor container networking, such as GKE, Docker, or Podman. Youcan't use the host network.

Many container technologies, such as GKE, Docker, and Podman usea dedicated network namespace for a container to isolate its traffic. Thisnetwork namespace may then be modified, either directly or using the containertechnology's tools to map traffic to a different network interface.

GKE requires that the primary interface is present forKubernetes-internal communication. However, the default route in the pod can bechanged to use the secondary interface, as shown in the followingGKE pod manifest.

metadata:  …  annotations:    networking.gke.io/default-interface: 'eth1'    networking.gke.io/interfaces: |      [        {"interfaceName":"eth0","network":"default"},        {"interfaceName":"eth1","network":"secondary-network"},      ]

This approach does not guarantee NUMA alignment between the default networkinterface and CPUs or memory.

Use VPC peering and let the system load-balance sessions across interfaces

Requirements:

VPC peering must be established between the VPCnetworks of the primary andsecondary gVNICs.
Linux kernel version 6.16 is required to load-balance TCP sessions acrosssource interfaces if sending to a single destination IP and port.
The workload can still meet your performance requirements when thenetworking stack generates cross-socket memory transfers.

High level overview

In some cases, it's challenging to shard network connections within anapplication or between instances of an application. In this scenario, for someapplications running on A3U or A4 VMs that are not sensitive to cross-NUMA orcross-socket transfer, it can be convenient to treat the two interfaces asfungible.

One method to achieve this is to use the fib_multipath_hash_policy sysctl anda multipath route:

PRIMARY_GW=192.168.1.1  # gateway of nic0SECONDARY_GW=192.168.2.1  # gateway of nic1PRIMARY_IP=192.168.1.15  # internal IP for nic0SECONDARY_IP=192.168.2.27  # internal IP nic1sysctl -w net.ipv4.fib_multipath_hash_policy=1  # Enable L4 5-tuple ECMP hashingip route add <destination-network/subnet-mask> nexthop via ${PRIMARY_GW} nexthopvia ${SECONDARY_GW}

Shard traffic across networks

Requirements:

nic0 andnic1 on the VM are in separate VPC networks andsubnets. This pattern requires that thedestination addresses aresharded acrossnic0's andnic1's VPC networks.

High level overview

By default, the Linux kernel creates routes fornic0's subnet andnic1'ssubnet that will route traffic by destination through the appropriate networkinterface.

For example, supposenic0 uses VPC networknet1 with subnetsubnet-a, andnic1 uses VPC networknet2 with subnetsubnet-b. By default, communications to peer IP addresses insubnet-a willusenic0, and communications to peer IP addresses insubnet-b will usenic1. For example, this scenario can occur with a set of peer single-NIC VMsconnected tonet1 and a set connected tonet2.

Tip: If two instances of the application can be run (one per VPC network), thenumactltool can be used to NUMA-align the application's memory and compute resourceswith its network interface to minimize cross-socket transfer.

Use SNAT to choose the source interface

Warning: This pattern is complex and has multiple interdependent parts. Makesure that you evaluate these rules in context of other networking configurationson your machine. If rules are misconfigured, system-wide traffic could bemisrouted or dropped entirely.

Requirements:

CAP_NET_ADMIN is required for setting up initial iptables rules, thoughnot for running the application.
You must carefully evaluate rules when using them in combination with othernontrivial iptables rules or routing configurations.

Note:

The NIC binding is only correct at the time the connection is created. If athread moves to a CPU associated with a different NUMA node, the connectionwill suffer cross-NUMA penalties. Therefore, this solution is most usefulwhen there is some mechanism to bind threads to specific CPU sets.
Only connections originated by this machine will be bound to a specific NIC.Inbound connections will be associated with the NIC matching whateveraddress they are destined to.

High-level overview

In scenarios where it's challenging to use network namespaces or makeapplication changes, you can use NAT to pick a source interface. You can usetools like iptables to rewrite the source IP for a flow to match a particularinterface's IP based on a property of the sending application, such as cgroup,user, or CPU.

The following example uses CPU-based rules. The end result is that a flow thatoriginates from a thread running on any given CPU is transmitted by the gVNICthat's attached to that CPU's corresponding NUMA node.

# --- Begin Configuration ---OUTPUT_INTERFACE_0="enp0s19"# CHANGEME: NIC0OUTPUT_INTERFACE_1="enp192s20"# CHANGEME: NIC1CPUS_0=($(seq055;seq112167))# CHANGEME: CPU IDs for NIC0GATEWAY_0="10.0.0.1"# CHANGEME: Gateway for NIC0SNAT_IP_0="10.0.0.2"# CHANGEME: SNAT IP for NIC0CONNMARK_0="0x1"RT_TABLE_0="100"CPUS_1=($(seq56111;seq168223))# CHANGEME: CPU IDs for NIC1GATEWAY_1="10.0.1.1"# CHANGEME: Gateway for NIC1SNAT_IP_1="10.0.1.2"# CHANGEME: SNAT IP for NIC1CONNMARK_1="0x2"RT_TABLE_1="101"# --- End Configuration ---# This informs which interface to use for packets in each table.iprouteadddefaultvia"$GATEWAY_0"dev"$OUTPUT_INTERFACE_0"table"$RT_TABLE_0"iprouteadddefaultvia"$GATEWAY_1"dev"$OUTPUT_INTERFACE_1"table"$RT_TABLE_1"# This is not required for connections we originate, but replies to# connections from peers need to know which interface to egress from.# Add it before the fwmark rules to implicitly make sure fwmark takes precedence.ipruleaddfrom"$SNAT_IP_0"table"$RT_TABLE_0"ipruleaddfrom"$SNAT_IP_1"table"$RT_TABLE_1"# This informs which table to use based on the packet mark set in OUTPUT.ipruleaddfwmark"$CONNMARK_0"table"$RT_TABLE_0"ipruleaddfwmark"$CONNMARK_1"table"$RT_TABLE_1"# Relax reverse path filtering.# Otherwise, we will drop legitimate replies to the SNAT IPs.sysctl-wnet.ipv4.conf."$OUTPUT_INTERFACE_0".rp_filter=2sysctl-wnet.ipv4.conf."$OUTPUT_INTERFACE_1".rp_filter=2# Mark packets/connections with a per-nic mark based on the source CPU.# The `fwmark` rules will then use the corresponding routing table for this traffic.forcpu_idin"${CPUS_0[@]}";doiptables-tmangle-AOUTPUT-mstate--stateNEW-mcpu--cpu"$cpu_id"-jCONNMARK--set-mark"$CONNMARK_0"iptables-tmangle-AOUTPUT-mstate--stateNEW-mcpu--cpu"$cpu_id"-jMARK--set-mark"$CONNMARK_0"doneforcpu_idin"${CPUS_1[@]}";doiptables-tmangle-AOUTPUT-mstate--stateNEW-mcpu--cpu"$cpu_id"-jCONNMARK--set-mark"$CONNMARK_1"iptables-tmangle-AOUTPUT-mstate--stateNEW-mcpu--cpu"$cpu_id"-jMARK--set-mark"$CONNMARK_1"done# For established connections, restore the connection mark.# Otherwise, we will send the packet to the wrong NIC, depending on existing# routing rules.iptables-tmangle-AOUTPUT-mconntrack--ctstateESTABLISHED,RELATED-jCONNMARK--restore-mark# These rules NAT the source address after the packet is already destined to# egress the correct interface. This lets replies to this flow target the correct NIC,# and may be required to be accepted into the VPC network.iptables-tnat-APOSTROUTING-mmark--mark"$CONNMARK_0"-jSNAT--to-source"$SNAT_IP_0"iptables-tnat-APOSTROUTING-mmark--mark"$CONNMARK_1"-jSNAT--to-source"$SNAT_IP_1"

Except as otherwise noted, the content of this page is licensed under the Creative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Patterns for using multiple host NICs (GPU VMs) Stay organized with collections Save and categorize content based on your preferences.

Overview

Patterns for using multiple host NICs

Change application to use a specific interface

Change application to use both interfaces

Use a dedicated network namespace for specific applications

Map an entire container's traffic to a single interface

Use VPC peering and let the system load-balance sessions across interfaces

Shard traffic across networks

Use SNAT to choose the source interface

Patterns for using multiple host NICs (GPU VMs)