Create an AI-optimized instance with A4X

This document describes the steps to create standalone virtual machine (VM)instances that use A4X accelerator-optimized machine types. To learn more aboutthe machine type, seeA4X seriesin the Compute Engine documentation.

To learn about VM and cluster creation options, seeDeployment options overviewpage.

Before you begin

Before creating VMs, if you haven't already done so, complete the followingsteps:

  1. Choose a consumption option: your choice of consumption option determines how you get and use GPU resources.

    To learn more, seeChoose a consumption option.

  2. Obtain capacity: the process to obtain capacity differs for each consumption option.

    To learn about the process to obtain capacity for your chosen consumption option, seeCapacity overview.

    Note: When you request A4X capacity, you obtain it in theall capacity mode. This mode is the only supported reservation operational mode for A4X machine types. For more information about all capacity mode, seeReservation operational mode.

Select the tab for how you plan to use the samples on this page:

Console

When you use the Google Cloud console to access Google Cloud services and APIs, you don't need to set up authentication.

gcloud

In the Google Cloud console, activate Cloud Shell.

Activate Cloud Shell

At the bottom of the Google Cloud console, aCloud Shell session starts and displays a command-line prompt. Cloud Shell is a shell environment with the Google Cloud CLI already installed and with values already set for your current project. It can take a few seconds for the session to initialize.

REST

To use the REST API samples on this page in a local development environment, you use the credentials you provide to the gcloud CLI.

    Install the Google Cloud CLI. After installation,initialize the Google Cloud CLI by running the following command:

    gcloudinit

    If you're using an external identity provider (IdP), you must first sign in to the gcloud CLI with your federated identity.

For more information, seeAuthenticate for using REST in the Google Cloud authentication documentation.

Required roles

To get the permissions that you need to create VMs, ask your administrator to grant you theCompute Instance Admin (v1) (roles/compute.instanceAdmin.v1) IAM role on the project. For more information about granting roles, seeManage access to projects, folders, and organizations.

This predefined role contains the permissions required to create VMs. To see the exact permissions that are required, expand theRequired permissions section:

Required permissions

The following permissions are required to create VMs:

  • compute.instances.create on the project
  • To use a custom image to create the VM: compute.images.useReadOnly on the image
  • To use a snapshot to create the VM: compute.snapshots.useReadOnly on the snapshot
  • To use an instance template to create the VM: compute.instanceTemplates.useReadOnly on the instance template
  • To specify a subnet for your VM: compute.subnetworks.use on the project or on the chosen subnet
  • To specify a static IP address for the VM: compute.addresses.use on the project
  • To assign an external IP address to the VM when using a VPC network: compute.subnetworks.useExternalIp on the project or on the chosen subnet
  • To assign alegacy network to the VM: compute.networks.use on the project
  • To assign an external IP address to the VM when using a legacy network: compute.networks.useExternalIp on the project
  • To set VM instance metadata for the VM: compute.instances.setMetadata on the project
  • To set tags for the VM: compute.instances.setTags on the VM
  • To set labels for the VM: compute.instances.setLabels on the VM
  • To set a service account for the VM to use: compute.instances.setServiceAccount on the VM
  • To create a new disk for the VM: compute.disks.create on the project
  • To attach an existing disk in read-only or read-write mode: compute.disks.use on the disk
  • To attach an existing disk in read-only mode: compute.disks.useReadOnly on the disk

You might also be able to get these permissions withcustom roles or otherpredefined roles.

A4X fundamentals

An A4X cluster is organized into a hierarchy of blocks and subblocks to facilitate large-scale,non-blocking network performance. Understanding this topology is key when reserving capacity anddeploying workloads.

A4X instance
An A4X instance is a single A4X machine type that has 4 GPUs attached.
NVLink domain or subblock
An NVLink domain, also referred to as a subblock, is the fundamental unit of A4X capacity. AnNVLink domain consists of 18 A4X instances (72 GPUs) connected using amulti-node NVLinksystem. You create an A4X NVLink domain or a subblock by applying a compact placement policy that specifies a1x72 topology.
Block
An A4X block is composed of 25 subblocks (NVLink domains), totalling up to 450 A4X instances(1,800 GPUs). The subblocks are rail-aligned for efficient scaling. Each subblock requires acompact placement policy.Therefore, for a single A4X block, you can create 25compact placement policies.

The following table shows the supported topology options for A4X instances:

Topology (gpuTopology)Number of GPUsNumber of instances
1x727218

Overview

Tip: To use an NVLink domain with the1x72 topology, youcan run the single instance creation commands 18 times or use theinstances.bulkInsertmethod that is designed to create multiple instances with a single API request. To create A4Xinstances in bulk, seeCreate A4Xinstances in bulk.

Creating an instance with the A4X machine type includes the following steps:

  1. Create VPC networks
  2. Create a compact placement policy
  3. Create an instance

Create VPC networks

Tip: If you are setting up a quick test, you can skip this step and specify a single NIC--network-interface=nic-type=GVNIC instead.

To set up the network for A4X machine type, create three VPC networks for the following network interfaces:

  • 2 regular VPC networks for thegVNIC network interfaces (NIC). These are used for host to host communication.
  • 1 VPC network with theRoCE network profile is required for the CX-7 NICs when creating multiple A4X subblocks. The RoCE VPC network needs to have 4 subnets, one subnet for each CX-7 NIC. These NICs use RDMA over Converged Ethernet (RoCE), providing the high-bandwidth, low-latency communication that's essential for scaling out to multiple A4X subblocks. For a single A4X subblock, you can skip this VPC network because within a single subblock, direct GPU to GPU communication is handled by the multi-node NVLink.

For more information about NIC arrangement, seeReview network bandwidth and NIC arrangement.

Create the networks either manually by following the instruction guides or automatically by usingthe provided script.

Instruction guides

To create the networks, you can use the following instructions:

For these VPC networks, we recommend setting themaximum transmission unit (MTU) to a larger value.For A4X machine type, the recommended MTU is8896 bytes.To review the recommended MTU settings for other GPU machine types, seeMTU settings for GPU machine types.

Script

To create the networks, follow these steps.

For these VPC networks, we recommend setting themaximum transmission unit (MTU) to a larger value.For A4X machine type, the recommended MTU is8896 bytes.To review the recommended MTU settings for other GPU machine types, seeMTU settings for GPU machine types.

  1. Use the following script to create regular VPC networks for the gVNICs.

          #!/bin/bash    # Create regular VPC networks and subnets for the gVNICs    for N in $(seq 0 1); do      gcloud compute networks createGVNIC_NAME_PREFIX-net-$N \        --subnet-mode=custom \        --mtu=8896      gcloud compute networks subnets createGVNIC_NAME_PREFIX-sub-$N \        --network=GVNIC_NAME_PREFIX-net-$N \        --region=REGION \        --range=192.168.$N.0/24      gcloud compute firewall-rules createGVNIC_NAME_PREFIX-internal-$N \        --network=GVNIC_NAME_PREFIX-net-$N \        --action=ALLOW \        --rules=tcp:0-65535,udp:0-65535,icmp \        --source-ranges=192.168.0.0/16    done    # Create SSH firewall rules    gcloud compute firewall-rules createGVNIC_NAME_PREFIX-ssh \      --network=GVNIC_NAME_PREFIX-net-0 \      --action=ALLOW \      --rules=tcp:22 \      --source-ranges=IP_RANGE    # Assumes that an external IP is only created for vNIC 0    gcloud compute firewall-rules createGVNIC_NAME_PREFIX-allow-ping-net-0 \      --network=GVNIC_NAME_PREFIX-net-0 \      --action=ALLOW \      --rules=icmp \      --source-ranges=IP_RANGE
  2. If you require multiple A4X subblocks, use the following script to create the RoCE VPC network and subnets for the four CX-7 NICs on each A4X instance.

    Important: If your deployment consists of only a single A4X subblock, you can skip this step.
          # List and make sure network profiles exist in the machine type's zone    gcloud compute network-profiles list --filter "location.name=ZONE"    # Create network for CX-7    gcloud compute networks createRDMA_NAME_PREFIX-mrdma \      --network-profile=ZONE-vpc-roce \      --subnet-mode custom \      --mtu=8896    # Create subnets    for N in $(seq 0 3); do      gcloud compute networks subnets createRDMA_NAME_PREFIX-mrdma-sub-$N \        --network=RDMA_NAME_PREFIX-mrdma \        --region=REGION \        --range=192.168.$((N+2)).0/24 & # offset to avoid overlap with gVNICs    done

    Replace the following:

  3. Optional: To verify that the VPC network resources are created successfully, check the network settings in the Google Cloud console:
    1. In the Google Cloud console, go to theVPC networks page.

      Go to VPC networks

    2. Search the list for the networks that you created in the previous step.
    3. To view the subnets, firewall rules, and other network settings, click the name of the network.

Create a compact placement policy

Important: To create multiple NVLink domains in a block, we recommend that you create separate compact placement policy for each NVLink domain. If you reuse a compact placement policy, Compute Engine attempts to place the instances in the same subblock, which is already in use.To create a compact placement policy, select one of the following options:

gcloud

To create a compact placement policy, use thegcloud beta compute resource-policies create group-placement command:

gcloud beta compute resource-policies create group-placementPOLICY_NAME \    --collocation=collocated \    --gpu-topology=1x72 \    --region=REGION

Replace the following:

  • POLICY_NAME: the name of the compact placement policy.
  • REGION: the region where you want to create the compact placement policy. Specify a region in which the machine type that you want to use is available. For information about regions, seeGPU availability by regions and zones.

REST

To create a compact placement policy, make aPOST request to thebetaresourcePolicies.insert method.

POST https://compute.googleapis.com/compute/beta/projects/PROJECT_ID/regions/REGION/resourcePolicies  {    "name": "POLICY_NAME",    "groupPlacementPolicy": {      "collocation": "COLLOCATED",      "gpuTopology": "1x72"    }  }

Replace the following:

  • PROJECT_ID: your project ID
  • POLICY_NAME: the name of the compact placement policy.
  • REGION: the region where you want to create the compact placement policy. Specify a region in which the machine type that you want to use is available. For information about regions, seeGPU availability by regions and zones.

Create an A4X instance

To obtain a GPU topology of1x72, create 18 instances. When you create the instances, apply the compact placement policy that specifies thegpuTopology field. Applying the policy ensures that Compute Engine creates all 18 instances in one subblock to use an NVLink domain. If a subblock lacks capacity for an instance, then the request to create the instance fails.

To create an A4X instance, select one of the following options.

The following commands also set the access scope for your instances. To simplify permissionsmanagement, Google recommends that you set the access scope on an instance tocloud-platform access and then use IAM roles to define what services the instance canaccess. For more information, seeScopes best practice.

gcloud

To create the VM, use thegcloud compute instances create command.

Important: The following example uses the networking setup for multiple subblocks. If you're creating only a single A4X subblock and you only created the gVNIC network in theCreate VPC networks step, then remove the four RDMA subnets indicated by theMRDMA NIC type from the example.

gcloud compute instances createVM_NAME  \    --machine-type=a4x-highgpu-4g \    --image-family=IMAGE_FAMILY \    --image-project=IMAGE_PROJECT \    --zone=ZONE \    --boot-disk-type=hyperdisk-balanced \    --boot-disk-size=DISK_SIZE \    --scopes=cloud-platform \    --network-interface=nic-type=GVNIC,network=GVNIC_NAME_PREFIX-net-0,subnet=GVNIC_NAME_PREFIX-sub-0 \    --network-interface=nic-type=GVNIC,network=GVNIC_NAME_PREFIX-net-1,subnet=GVNIC_NAME_PREFIX-sub-1,no-address \    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-0,no-address \    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-1,no-address \    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-2,no-address \    --network-interface=nic-type=MRDMA,network=RDMA_NAME_PREFIX-mrdma,subnet=RDMA_NAME_PREFIX-mrdma-sub-3,no-address \    --reservation-affinity=specific \    --reservation=RESERVATION \    --provisioning-model=RESERVATION_BOUND \    --instance-termination-action=TERMINATION_ACTION \    --maintenance-policy=TERMINATE \    --resource-policies=POLICY_NAME

Replace the following:

REST

To create the VM, make aPOST request to theinstances.insert method.

Important: The following example uses the networking setup for multiple subblocks. If you're creating only a single A4X subblock and you only created the gVNIC network in theCreate VPC networks step, then remove the four RDMA subnets indicated by theMRDMA NIC type from the example.

POST https://compute.googleapis.com/compute/v1/projects/PROJECT_ID/zones/ZONE/instances{  "machineType": "projects/PROJECT_ID/zones/ZONE/machineTypes/a4x-highgpu-4g",  "name": "VM_NAME",  "disks":[    {      "boot":true,      "initializeParams":{        "diskSizeGb": "DISK_SIZE",        "diskType": "hyperdisk-balanced",        "sourceImage": "projects/IMAGE_PROJECT/global/images/family/IMAGE_FAMILY"      },      "mode": "READ_WRITE",      "type": "PERSISTENT"    }  ],  "serviceAccounts": [    {      "email": "default",      "scopes": [        "https://www.googleapis.com/auth/cloud-platform"      ]    }  ],  "networkInterfaces": [    {      "accessConfigs": [        {          "name": "external-nat",          "type": "ONE_TO_ONE_NAT"        }      ],      "network": "projects/NETWORK_PROJECT_ID/global/networks/GVNIC_NAME_PREFIX-net-0",      "nicType": "GVNIC",      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/GVNIC_NAME_PREFIX-sub-0"    },    {      "network": "projects/NETWORK_PROJECT_ID/global/networks/GVNIC_NAME_PREFIX-net-1",      "nicType": "GVNIC",      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/GVNIC_NAME_PREFIX-sub-1"    },    {      "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",      "nicType": "MRDMA",      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-0"    },    {      "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",      "nicType": "MRDMA",      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-1"    },    {      "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",      "nicType": "MRDMA",      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-2"    },    {      "network": "projects/NETWORK_PROJECT_ID/global/networks/RDMA_NAME_PREFIX-mrdma",      "nicType": "MRDMA",      "subnetwork": "projects/NETWORK_PROJECT_ID/region/REGION/subnetworks/RDMA_NAME_PREFIX-mrdma-sub-3"    }  ],  "reservationAffinity":{    "consumeReservationType": "SPECIFIC_RESERVATION",    "key": "compute.googleapis.com/reservation-name",    "values":[      "RESERVATION"    ]  },  "scheduling":{    "provisioningModel": "RESERVATION_BOUND",    "instanceTerminationAction": "TERMINATION_ACTION",    "onHostMaintenance": "TERMINATE",    "automaticRestart": true  },  "resourcePolicies": [    "projects/PROJECT_ID/regions/REGION/resourcePolicies/POLICY_NAME"  ]}

Replace the following:

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.