Overview of HPC clusters with enhanced cluster management capabilities

To create the infrastructure for tightly-coupled applications that scale acrossmultiple nodes, you can create a cluster of virtual machine (VM) instances. Thisguide provides a high-level overview of the key considerations and steps toconfigure a cluster of virtual machine (VM) instances for high performancecomputing (HPC) workloads using dense resource allocation.

WithH4D,Compute Engine adds support for running massive HPC workloads bytreating an entire cluster of VM instances as a single computer. Usingtopology-aware placement of VMs lets you access many instances within a singlenetworking superblock and minimizes network latency. You can also configureCloud RDMA on these instancesto maximize inter-node communication performance, which is crucial fortightly-coupled HPC workloads.

Note: This type of configuration relies on similar features and concepts asthose documented in theAI Hypercomputer documentation foraccelerator-optimized VMs with GPUs.

You create these HPC VM clusters with H4D by reserving blocks of capacityinstead of individual resources. Using blocks of capacity for your clusterenablesenhanced cluster management capabilities.

HPC clusters with H4D instances can be created either with or withoutenhanced cluster management capabilities. If you don't require enhanced cluster management capabilities features with your H4D HPCcluster, or if you want to create HPC clusters using a machine series other thanH4D, then use the following instructions for creating HPC instances or clusters:

Cluster terminology

When working with blocks of capacity, the following terms are used:

Blocks
Multiple sub-blocks interconnect with a non-blocking fabric, providing a high-bandwidth interconnect. Any CPU within the block is reachable in a maximum of two network hops. The system exposes block and sub-block metadata to orchestrators to enable optimal job placement.
Clusters
Multiple blocks interconnect to form a cluster that scales to thousands of CPUs for running large-scale HPC workloads. Each cluster is globally unique. Communication across different blocks adds only one additional hop, maintaining high performance and predictability, even at a massive scale. Cluster-level metadata is also available to orchestrators for intelligent, large-scale job placement.
Cluster Toolkit
An open source tool offered by Google that simplifies the configuration and deployment for clusters that use either Slurm or Google Kubernetes Engine. You use predefined blueprints to build a deployment folder that is based on the blueprint. You can modify blueprints or the deployment folder to customize deployments and your software stack. You then use Terraform or Packer to run the commands generated by Cluster Toolkit to deploy the cluster.
Dense deployment
A resource request that allocates your compute instance resources physically close to each other to minimize network hops and optimize for the lowest latency.
Network fabric
A network fabric provides high-bandwidth, low-latency connectivity across all blocks and Google Cloud services in a cluster. Jupiter is Google's data center network architecture that leverages software-defined networking and optical circuit switches to evolve the network and optimize its performance.
Node or host
A single physical server machine in the data center. Each host has its associated compute resources CPUs, memory, and network interfaces. The number and configuration of these compute resources depend on the machine family. VM instances are provisioned on top of a physical host.
Orchestrator
An orchestrator automates the management of your clusters. With an orchestrator, you don't have to manage each VM instance in the cluster. An orchestrator, such asSlurm or Google Kubernetes Engine (GKE), handles tasks like job queueing, resource allocation, auto scaling (with GKE), and other day-to-day cluster management tasks.
Sub-blocks
These are foundational units where a group of hosts physically co-locates on a single rack. A Top-of-Rack (ToR) switch connects these hosts, enabling extremely efficient, single-hop communication between any two CPUs within the sub-block. Cloud RDMA facilitates this direct communication.

Overview of cluster creation process with H4D VMs

To create HPC clusters on reserved blocks of capacity, you must complete thefollowing steps:

  1. Review available provisioning models
  2. Choose a consumption option and obtain capacity
  3. Choose a deployment option and orchestrator
  4. Choose the operating system or cluster image
  5. Create your cluster

Provisioning models for VM and cluster creation

When creating VM instances, you can use the provisioning models described inCompute Engine instances provisioning models.

To create a tightly-coupled H4D instances, you must use one of the followingprovisioning models to obtain the necessary resources for creating computeinstances:

  • Reservation-bound: you can reserve resources at a discounted price for afuture date and duration. At the start of your reservation period, you can usethe reserved resources to create VMs or clusters. You have exclusive access toyour reserved resources for the reservation period.

  • Flex-start: you can request discounted resources for up to seven days.Compute Engine makes best-effort attempts to schedule the provisioningof your requested resources as soon as they're available. You have exclusiveaccess to your obtained resources for your requested period.

  • Spot: based on availability, you can immediately obtain deeply discountedresources. However, Compute Engine might stop or delete the VMinstances at any time to reclaim capacity.

Reservation-bound provisioning model

The reservation-bound provisioning model links your created VM instances to thecapacity that you previously reserved. When you reserve capacity,Compute Engine creates an empty reservation. Then, at the reservationstart time, the following occurs:

  • Compute Engine adds your reserved resources to the reservation.You have exclusive access to the reserved capacity until the reservation endtime.

  • Google Cloud charges you for the reserved capacity until the end of yourreservation period, whether you use the capacity or not.

You can then use the reserved resources to create VMs without additionalcharges. You only pay for resources that aren't included in the reservation,such as disks or IP addresses.

You can reserve resources for as many VMs as you like for aslong as you like for a future date. Then, you can use the reserved resources tocreate and run VMs until the end of the reservation period. If you reserveresources for one year or longer, then you must purchase and attach aresource-based commitment.

To provision resources using the reservation-bound provisioning model, see:

You can use reservation-bound provisioning with H4D instances by specifying thereservation-bound provisioning model when creating individual VMs, a HPCcluster, or a group of VMs.

Flex-start provisioning model

To run short-duration workloads that require densely allocated resources, youcan request compute resources for up to seven days by using Flex-start. Wheneverresources are available, Compute Engine creates your requested number ofVMs. You can stop standalone Flex-start VMs, but you can't stopFlex-start VMs that a managed instance group (MIG) creates throughresize requests. The Flex-start VMs exist until you delete them, oruntil Compute Engine deletes the VMs at the end of their run duration.

Flex-start is ideal for workloads that can start at any time. The flex-startprovisioning model provisions resources from a secure capacity pool, so theallocated resources are densely allocated to minimize network latency.

When you add Flex-start VMs to amanaged instance group (MIG) by using resize requests, the MIG creates the VMsall at once. This approach helps you avoid unnecessary charges forpartial capacity that Compute Engine might deliver while you wait forthe full capacity needed to start your workload.

You can use Flex-start provisioning with H4D instances, using any availabledeployment model.

Spot provisioning model

To run fault-tolerant workloads, you can obtain compute resources immediatelybased on availability. You get resources at the lowest price possible. However,Compute Engine might stop or delete the created Spot VMs at anytime to reclaim capacity. This process is calledpreemption.

Spot VMs are ideal for workloads where interruptions are acceptable,such as:

  • Batch processing
  • High performance computing (HPC)
  • Data analytics
  • Continuous integration and continuous deployment (CI/CD)
  • Media encoding

You can use Spot VMs with any machine type, except A4X, X4, and baremetal machine types. Dense allocation depends on resource availability. To helpensure a closer allocation, you can apply a compact placement policy to theSpot VMs.

Note: Spot VMs are not covered by any Service Level Agreement and areexcluded from theCompute Engine SLA.

You can use Spot VMs with the following dense deployment options:

Choose a consumption option and obtain capacity

Consumption options determine how resources are obtained for your cluster. Tocreate a cluster that uses enhanced cluster management capabilities, you must request blocksof capacity for adense deployment.

The following table summarizes the key differences between the consumptionoptions for blocks of capacity:

Note: You can also request a future reservation for more than 90 days. If youneed to reserve this capacity, seeReserve capacity through your account team.
Consumption optionFuture reservations for capacity blocksFuture reservations for up to 90 days (in calendar mode)Flex-startSpot
Workload characteristicsLong-running, large-scale distributed workloads that require densely allocated resourcesShort-duration workloads that require densely allocated resourcesShort-duration workloads that require densely allocated resourcesFault-tolerant workloads
LifespanAny timeUp to 90 daysUp to 7 daysAny time, but subject topreemption
PreemptibleNoNoNoYes
Capacity assuranceVery highVery highBest effortBest effort
QuotaCheck that you have enough quota before creating instances.No quota is chargedPreemptible quota is charged.Preemptible quota is charged.
Pricing
Resource allocationDenseDenseDenseStandard (Compact placement policy optional)
Provisioning modelReservation-boundReservation-boundFlex-startSpot
Creation method

To create HPC clusters and VMs, you must do the following:

  1. Reserve capacity through your account team
  2. At your chosen date and time, you can use the reserved capacity to create HPC clusters. SeeChoose a deployment option.

To create HPC clusters and VMs, you must do the following:

  1. Create a future reservation request in calendar mode
  2. At your chosen date and time, you can use the reserved capacity to create HPC clusters. SeeChoose a deployment option.

To create VMs, select one of the following options:

When your requested capacity becomes available, Compute Engine provisions it.

You can immediately create VMs. SeeChoose a deployment option.

Choose a deployment option

High performance computing (HPC) workloads aggregate computing resources to gainperformance greater than that of a single workstation, server, or computer. HPCis used to solve problems in academic research, science, design, simulation, andbusiness intelligence.

For HPC clusters with enhanced cluster management capabilities, choose the H4D machine series. If you planto use a different machine series, follow the documentation atCreate an HPC-ready VM instanceinstead of using the deployment methods listed on this page.

Some of the available deployment optionsinclude the installation and configuration of anorchestratorfor enhanced management of the HPC cluster.

For the most appropriate option to create your VMs or clusters for your usecase, choose one of the following:

OptionUse case
Cluster Toolkit

You want to use open-source software that simplifies the process for you to deploy both Slurm and Google Kubernetes Engine (GKE) clusters.Cluster Toolkit is designed to be highly customizable and extensible. To learn more, see the following:

GKEYou want maximum flexibility in configuring your Google Kubernetes Engine cluster based on the needs of your workload. To learn more, seeRun HPC workloads with H4D.
Use Compute Engine

You want full control of the infrastructure layer so that you can set up your own orchestrator. To learn more, see the following:

Choose the operating system image

The operating system (OS) image you choose depends on the service you use todeploy your cluster.

  • For clusters on GKE: Use a GKE node image,such as Container-Optimized OS. If you use Cluster Toolkit todeploy your GKE cluster, a Container-Optimized OSimage is used by default. For more information about node images, seeNode images in theGKE documentation.

  • For clusters on Compute Engine: You can use one of the followingimages:

  • For Slurm Clusters: Cluster Toolkit deploys the Slurm Clusterwith a HPC VM image based on Rocky Linux 8 that is optimized fortightly-coupled HPC workloads.

Create your HPC cluster

After you review the cluster creation process and make preliminary decisionsfor your workload, create your cluster by using any of thedeployment options.

Enhanced cluster management capabilities for your HPC cluster

When you create H4D instances with densely allocated resources using thedeployment methods mentioned inChoose a deployment option,you can use enhanced HPC cluster management capabilities with your instances.

For more information about these capabilities, seeEnhanced HPC cluster management with H4D instances.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.