Patterns for using multiple host NICs (GPU VMs) Stay organized with collections Save and categorize content based on your preferences.
Some accelerator-optimized machines, including A3 Ultra, A4, and A4X, have twohost network interfaces in addition to the MRDMA interfaces on these machines.On the host, these areTitanium IPUs whichare plugged into separate CPU sockets and non-uniform memory access (NUMA)nodes. These IPUs areavailable inside theVM as Google Virtual NICs(gVNICs), and provide network bandwidth for storage activities such ascheckpointing, loading training data, loading models, and other generalnetworking needs. The machine's NUMA topology, including that of the gVNICs, isvisible to the guest operating system (OS).
This document describes best practices for using the two gVNICs on thesemachines.
Overview
In general, we recommend that you use the following configurations, regardlessof how you plan to use multiple host NICs:
- Network settings:Each gVNIC must have a unique VPCnetwork. For a VPC set up, considerthe following:
- Use a largemaximum transmission unit (MTU) for eachVPC network. 8896 is the maximum supported MTU and a recommended choice.The ingress performance for some workloads might be slowed down due tothe system dropping incoming data packets on the receiver side. You canuse the ethtool tool to check for this issue. In this scenario, it canbe helpful to adjust the TCP MSS, interface MTU, or VPC MTU to allow forefficient data allocation from the page cache, which allows the incominglayer 2 frame to fit within two 4KB buffers.
- Application settings
- NUMA-align the application. Use CPU cores, memory allocations, and anetwork interface from the same NUMA node. If you are running adedicated instance of the application to use a specific NUMA node ornetwork interface, you can use tools like
numactlto attach theapplication's CPU and memory resources to a specific NUMA node.
- NUMA-align the application. Use CPU cores, memory allocations, and anetwork interface from the same NUMA node. If you are running adedicated instance of the application to use a specific NUMA node ornetwork interface, you can use tools like
- Operating system settings
- Enable TCP segmentation offload (TSO) and large receive offload (LRO).
- For each gVNIC interface, ensure that the SMP affinity is set up so thatits interrupt requests (IRQs) are handled on the same NUMA node as theinterface, and spread interrupts out across cores. If you're running aGoogle-supplied guest OS image, this process happens automatically usingthe
google_set_multiqueuescript. - Evaluate settings likeRFS, RPS, andXPS tosee if they may be helpful for your workload.
- For A4X, Nvidia recommendsdisabling automatic NUMAscheduling.
- Linux kernel bonding is not supported for the gVNICs on thesemachines.
Patterns for using multiple host NICs
The section outlines general patterns for using multiple host NICs onGoogle Cloud.
| Pattern | Supported process layout | Deployment path: Compute Engine | Deployment path: GKE | Deployment path: Slurm | Notes |
|---|---|---|---|---|---|
| Change application to use aspecific interface | Process shard per interface | Yes | Yes | Yes | Requires code changes to the application |
| Change application to use bothinterfaces | Dual-interface process | Yes | Yes | Yes | Requires code changes to the application |
| Use a dedicated network namespace forspecific applications | Process shard per interface | Yes | Yes (privileged containers only) | No | |
| Map an entire container's traffic to a singleinterface | All container traffic mapped to one interface | Yes | Yes | No | |
| Peer the VPCs and let the systemload-balance sessions across interfaces | Dual-interface process | Yes* | Yes* | Yes* | Challenging or impossible to NUMA-align Need Linux Kernel 6.16 or later* |
| Shard traffic acrossnetworks | Dual-interface process Process shard per interface | Yes* | Yes* | Yes* | Might require code changes to NUMA-align if running a dual-interfaceprocess. |
| Use SNAT to choose the source interface | Dual-interface process Process shard per interface | Yes | Yes (setup requires administrator privileges) | Yes (setup requires administrator privileges) | Can be more challenging to configure correctly |
* This option is not generally recommended but might be useful for limitedworkloads on x86 (A3 Ultra and A4) platforms.
Change application to use a specific interface
Requirements:
- This method requires code changes to your application.
- Requires permissions for one or more of the following methods:
bind()only requires special permissions if using a privileged sourceport.SO_BINDTODEVICE: requiresCAP_NET_RAWpermission.
- This method can require you to modify your kernel routing table to establishroutes and to prevent asymmetric routing.
High-level overview
Note: Some applications and libraries are already capable of and optimized formulti-NUMA or multi-socket scenarios, including multiple network interfaces. Ifyou're using third-party or open source software, be sure to check thedocumentation for that software.With this pattern, you complete the following:
- Add network interface binding to your application's source code by using oneof the following options:
- Use
bind()to bind a socket to a particular source IP address - Use the
SO_BINDTODEVICEsocket option to bind a socket to a particularnetwork interface
- Use
- Modify the kernel routing table as needed to ensure a route exists from thesource network interface to the destination address. In addition, routesmight be required to prevent asymmetric routing. We recommend that you configurepolicy routing as described inConfigure routing for an additional networkinterface.
- You can also use the
numactlcommand to run your application. In this approach,you use the memory and CPUs that are on the same NUMA node as your chosennetwork interface.
After you complete the preceding steps, instances of your application run usinga specific network interface.
Change application to use both interfaces
Requirements:
- This method requires code changes to your application.
- You require permissions for one or more of the following methods:
bind()only requires special permissions if using a privileged sourceport.SO_BINDTODEVICE: requiresCAP_NET_RAWpermission.
- This method can require you to modify your kernel routing table to establishroutes and to prevent asymmetric routing.
High-level overview
Note: Some applications and libraries are already capable of and optimized formulti-NUMA or multi-socket scenarios along with multiple network interfaces. Ifusing third-party or open source software, be sure to check the documentationfor that software.To implement this pattern, you do the following:
- Add network interface binding to your application's source code by using oneof the following options:
- Use the
bind()system call to bind a socket to a particular source IPaddress - Use the
SO_BINDTODEVICEsocket option to bind a socket to a particularnetwork interface
- Use the
- If your application is acting as the client, you will need to create aseparate client socket for each source network interface.
- Modify the kernel routing table as needed to ensure a route exists from thesource network interface to the destination address. In addition, you mightalso require routes to prevent asymmetric routing. We recommend that youconfigure policy routing as described inConfigure routing for anadditional networkinterface.
- We recommend that you partition network activity into threads that run onthe same NUMA node as the gVNIC interface. One common way of requesting aspecific NUMA node for a thread is to call
pthread_setaffinity_np.- Since the application utilizes resources on multiple NUMA nodes,avoid using
numactlor ensure yournumactlcommand includes the NUMAnodes of all network interfaces used by your application.
- Since the application utilizes resources on multiple NUMA nodes,avoid using
Use a dedicated network namespace for specific applications
Requirements:
- Requires
CAP_SYS_ADMINcapability. - Not compatible with GKE autopilot.
- If using GKE, you must have a privileged container.
This section describes patterns that you can use to create a network namespacethat uses a secondary network interface. The right pattern for your workloaddepends on your specific scenario. The approaches that use virtual switch orIPvlan are better suited to cases where multiple applications need to use thesecondary interface from different network namespaces.
High-level overview: moving secondary interface into dedicated networknamespace
This pattern involves creating a network namespace, moving the secondary gVNICinterface into the new namespace, and then running the application from thisnamespace. This pattern might be less complicated to set up and tune compared tousing a virtual switch. However, applications outside of the new networknamespace will be unable to access the secondary gVNIC.
The following example shows a series of commands that can be used to move eth1into the new network namespace called second.
ip netns add secondip link set eth1 netns secondip netns exec second ip addr add ${ETH1_IP}/${PREFIX} dev eth1ip netns exec second ip link set dev eth1 upip netns exec second ip route add default via ${GATEWAY_IP} dev eth1ip netns exec second <command>When this command is run, the <command> expression is then executed insidethe network namespace, and uses the eth1 interface.
Applications running inside the new network namespace now use the secondarygVNIC. You can also use thenumactl command to run your application using thememory and CPUs that are on the same NUMA node as your chosen network interface.
High-level overview: using a virtual switch and network namespace for asecondary interface This pattern involves creating a virtual switch setup touse the secondary gVNIC from a network namespace.
The high-level steps are as follows:
- Create a Virtual Ethernet (veth) device pair. Adjust the maximumtransmission unit (MTU) on each of the devices to match the MTU of thesecondary gVNIC.
- Run the following to ensure that IP forwarding is enabled for IPv4: sysctl-w net.ipv4.ip_forward=1
- Move one end of the veth pair into a new network namespace, and leave theother end in the root namespace.
- Map traffic from the veth device to the secondary gVNIC interface. There areseveral ways to do this, however, we recommend that youcreate an IP aliasrange for the VM's secondary interface and assign an IPaddress from this range to the child interface in the namespace.
- Run the application from the new network namespace. You can use the
numactlcommand to run your application using memory and CPUs that are onthe same NUMA node as the chosen network interface.
Depending on the guest and workload setup, alternatively, you can use theIPvlan driverwith an IPvlan interface linked to the secondary gVNIC instead of creating theveth devices.
Map an entire container's traffic to a single interface
Requirements:
- Your application must run inside a container that uses a network namespacefor container networking, such as GKE, Docker, or Podman. Youcan't use the host network.
Many container technologies, such as GKE, Docker, and Podman usea dedicated network namespace for a container to isolate its traffic. Thisnetwork namespace may then be modified, either directly or using the containertechnology's tools to map traffic to a different network interface.
GKE requires that the primary interface is present forKubernetes-internal communication. However, the default route in the pod can bechanged to use the secondary interface, as shown in the followingGKE pod manifest.
metadata: … annotations: networking.gke.io/default-interface: 'eth1' networking.gke.io/interfaces: | [ {"interfaceName":"eth0","network":"default"}, {"interfaceName":"eth1","network":"secondary-network"}, ]This approach does not guarantee NUMA alignment between the default networkinterface and CPUs or memory.
Peer the VPCs and let the system load-balance sessions across interfaces
Requirements:
- VPC peering must be established between the VPCs of the primary andsecondary gVNICs.
- Linux kernel version 6.16 is required to load-balance TCP sessions acrosssource interfaces if sending to a single destination IP and port.
- The workload can still meet your performance requirements when thenetworking stack generates cross-socket memory transfers.
High level overview
In some cases, it's challenging to shard network connections within anapplication or between instances of an application. In this scenario, for someapplications running on A3U or A4 VMs that are not sensitive to cross-NUMA orcross-socket transfer, it can be convenient to treat the two interfaces asfungible.
One method to achieve this is to use the fib_multipath_hash_policy sysctl anda multipath route:
PRIMARY_GW=192.168.1.1 # gateway of nic0SECONDARY_GW=192.168.2.1 # gateway of nic1PRIMARY_IP=192.168.1.15 # internal IP for nic0SECONDARY_IP=192.168.2.27 # internal IP nic1sysctl -w net.ipv4.fib_multipath_hash_policy=1 # Enable L4 5-tuple ECMP hashingip route add <destination-network/subnet-mask> nexthop via ${PRIMARY_GW} nexthopvia ${SECONDARY_GW}Shard traffic across networks
Requirements:
nic0andnic1on the VM are in separate VPCs and subnets. This patternrequires that thedestination addresses are sharded acrossnic0's andnic1's VPCs.
High level overview
By default, the Linux kernel creates routes fornic0's subnet andnic1'ssubnet that will route traffic by destination through the appropriate networkinterface.
For example, supposenic0 uses VPCnet1 with subnetsubnet-a, andnic1uses VPCnet2 with subnetsubnet-b. By default, communications to peer IPaddresses insubnet-a will usenic0, and communications to peer IP addressesinsubnet-b will usenic1. For example, this scenario can occur with a setof peer single-NIC VMs connected tonet1 and a set connected tonet2.
numactltool can be used to NUMA-align the application's memory and compute resourceswith its network interface to minimize cross-socket transfer.Use SNAT to choose the source interface
Warning: This pattern is complex and has multiple interdependent parts. Makesure that you evaluate these rules in context of other networking configurationson your machine. If rules are misconfigured, system-wide traffic could bemisrouted or dropped entirely.Requirements:
CAP_NET_ADMINis required for setting up initial iptables rules, thoughnot for running the application.- You must carefully evaluate rules when using them in combination with othernontrivial iptables rules or routing configurations.
Note:
- The NIC binding is only correct at the time the connection is created. If athread moves to a CPU associated with a different NUMA node, the connectionwill suffer cross-NUMA penalties. Therefore, this solution is most usefulwhen there is some mechanism to bind threads to specific CPU sets.
- Only connections originated by this machine will be bound to a specific NIC.Inbound connections will be associated with the NIC matching whateveraddress they are destined to.
High-level overview
In scenarios where it's challenging to use network namespaces or makeapplication changes, you can use NAT to pick a source interface. You can usetools like iptables to rewrite the source IP for a flow to match a particularinterface's IP based on a property of the sending application, such as cgroup,user, or CPU.
The following example uses CPU-based rules. The end result is that a flow thatoriginates from a thread running on any given CPU is transmitted by the gVNICthat's attached to that CPU's corresponding NUMA node.
# --- Begin Configuration ---OUTPUT_INTERFACE_0="enp0s19"# CHANGEME: NIC0OUTPUT_INTERFACE_1="enp192s20"# CHANGEME: NIC1CPUS_0=($(seq055;seq112167))# CHANGEME: CPU IDs for NIC0GATEWAY_0="10.0.0.1"# CHANGEME: Gateway for NIC0SNAT_IP_0="10.0.0.2"# CHANGEME: SNAT IP for NIC0CONNMARK_0="0x1"RT_TABLE_0="100"CPUS_1=($(seq56111;seq168223))# CHANGEME: CPU IDs for NIC1GATEWAY_1="10.0.1.1"# CHANGEME: Gateway for NIC1SNAT_IP_1="10.0.1.2"# CHANGEME: SNAT IP for NIC1CONNMARK_1="0x2"RT_TABLE_1="101"# --- End Configuration ---# This informs which interface to use for packets in each table.iprouteadddefaultvia"$GATEWAY_0"dev"$OUTPUT_INTERFACE_0"table"$RT_TABLE_0"iprouteadddefaultvia"$GATEWAY_1"dev"$OUTPUT_INTERFACE_1"table"$RT_TABLE_1"# This is not required for connections we originate, but replies to# connections from peers need to know which interface to egress from.# Add it before the fwmark rules to implicitly make sure fwmark takes precedence.ipruleaddfrom"$SNAT_IP_0"table"$RT_TABLE_0"ipruleaddfrom"$SNAT_IP_1"table"$RT_TABLE_1"# This informs which table to use based on the packet mark set in OUTPUT.ipruleaddfwmark"$CONNMARK_0"table"$RT_TABLE_0"ipruleaddfwmark"$CONNMARK_1"table"$RT_TABLE_1"# Relax reverse path filtering.# Otherwise, we will drop legitimate replies to the SNAT IPs.sysctl-wnet.ipv4.conf."$OUTPUT_INTERFACE_0".rp_filter=2sysctl-wnet.ipv4.conf."$OUTPUT_INTERFACE_1".rp_filter=2# Mark packets/connections with a per-nic mark based on the source CPU.# The `fwmark` rules will then use the corresponding routing table for this traffic.forcpu_idin"${CPUS_0[@]}";doiptables-tmangle-AOUTPUT-mstate--stateNEW-mcpu--cpu"$cpu_id"-jCONNMARK--set-mark"$CONNMARK_0"iptables-tmangle-AOUTPUT-mstate--stateNEW-mcpu--cpu"$cpu_id"-jMARK--set-mark"$CONNMARK_0"doneforcpu_idin"${CPUS_1[@]}";doiptables-tmangle-AOUTPUT-mstate--stateNEW-mcpu--cpu"$cpu_id"-jCONNMARK--set-mark"$CONNMARK_1"iptables-tmangle-AOUTPUT-mstate--stateNEW-mcpu--cpu"$cpu_id"-jMARK--set-mark"$CONNMARK_1"done# For established connections, restore the connection mark.# Otherwise, we will send the packet to the wrong NIC, depending on existing# routing rules.iptables-tmangle-AOUTPUT-mconntrack--ctstateESTABLISHED,RELATED-jCONNMARK--restore-mark# These rules NAT the source address after the packet is already destined to# egress the correct interface. This lets replies to this flow target the correct NIC,# and may be required to be accepted into the VPC.iptables-tnat-APOSTROUTING-mmark--mark"$CONNMARK_0"-jSNAT--to-source"$SNAT_IP_0"iptables-tnat-APOSTROUTING-mmark--mark"$CONNMARK_1"-jSNAT--to-source"$SNAT_IP_1"Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.