Recommended configurations

This document provides recommendations of which accelerator, consumption type,storage service, and deployment tool is best suited for different artificialintelligence (AI), machine learning (ML), and high performance computing (HPC)workloads. Use this document to help you identify the best deployment for yourworkload.

Workloads overview

AI Hypercomputer architecture supports the following use cases:

WorkloadsDescriptionRecommendation
Pre-training foundation modelsThis involves building a language model using a large dataset. The result of pre-training foundation models is a new model that is good at performing general tasks.
Models are categorized based on their size as follows:
  • Frontier model: these are ML models that span hundreds of billions to trillions of parameters or higher. These include large language models (LLMs) such as Gemini.
  • Large model: these are models that span tens to hundreds of billions of parameters or higher.
See recommendations for pre-training models
Fine-tuningThis involves taking a trained model and adapting it to perform specific tasks by using specialized data sets or other techniques. Fine-tuning is generally performed on large models.See recommendations for fine-tuning models
Inference or servingThis involves taking a trained or fine-tuned model and making it available for consumption by users or applications.
Inference workloads are categorized based on the size of the models as follows:
  • Multi-host foundation model inference: performing inference with trained ML models that span hundreds of billions to trillions of parameters or higher. For these inference workloads the computational load is shared across multiple host machines.
  • Single-host foundation model inference: performing inference with trained ML models that span tens to hundred of billions of parameters. For these inference workloads the computational load is confined to a single host machine.
  • Large model inference: performing inference with trained or fine-tuned ML models that span tens to hundreds of billions of parameters.
See recommendations for inference
HPCThis is the practice of aggregating computing resources to gain performance greater than that of a single workstation, server, or computer. HPC is used to solve problems in academic research, science, design, simulation, and business intelligence.See recommendations for HPC

Recommendations for pre-training models

Pre-training foundation models involves large clusters of accelerators,continuously reading large volumes of data, and adjusting weights throughforward and backward passes to learn from the data. These training jobs run forweeks, or even months at a time.

The following sections outline the accelerators, recommended consumption type,and storage service to use when pre-training models.

Recommended accelerators

To pre-train foundational models on Google Cloud, we recommend using theA4X,A4, orA3 accelerator-optimized machines and deploying thesemachines by using an orchestrator. To deploy these large clusters ofaccelerators, we also recommend usingCluster Toolkit. To get you startedwith these clusters, a link to a deployment guide for each recommended machinetype is provided.

WorkloadsRecommendationsCluster deployment guide
Machine typeOrchestrator
  • Frontier model training
  • Large model training
  • A4X
  • A4
  • A3 Ultra
GKECreate an AI-optimized GKE cluster with default configuration
SlurmCreate an AI-optimized Slurm cluster
  • Frontier model training
  • Large model training
A3 MegaGKEMaximize GPU network bandwidth in Standard mode clusters
SlurmDeploy an A3 Mega Slurm cluster for ML training
  • Large model training
A3 HighGKEMaximize GPU network bandwidth in Standard mode clusters
SlurmDeploy an A3 High Slurm cluster

Recommended consumption type

For a high level of assurance in obtaining large clusters of accelerators atminimum costs, we recommend using a reservation and requesting these reservationfor a long duration. For more information about consumption types, seeChoose a consumption option.

Recommended storage services

For pre-training, training data needs to be ready continuously and quickly. Wealso recommend frequent and fast checkpointing of the model being trained. Formost of these needs, we recommend that you use Google Cloud Managed Lustre. You canalternatively use Cloud Storage with Cloud Storage FUSE andAnywhere Cache enabled. For more information about storage options,seeStorage services.

Recommendations for fine-tuning models

Fine-tuning large foundational models involves smaller clusters of accelerators,reading moderate volumes of data and adjusting the model to perform specific tasks.These fine-tuning jobs run for days, or even weeks.

The following sections outline the accelerators, recommended consumption type,and storage service to use when fine-tuning models.

Recommended accelerators

To fine-tune models on Google Cloud, we recommend using anA3 accelerator-optimized machines and deploying these machines usingan orchestrator. To deploy these clusters of accelerators, we also recommendusingCluster Toolkit.To get you started with these clusters, a link to a cluster deployment guidefor each recommended machine type is provided.

WorkloadsRecommendationsCluster deployment guide
Machine typeOrchestrator
Fine-tuning large modelsA3 MegaGKEMaximize GPU network bandwidth in Standard mode clusters
SlurmDeploy an A3 Mega Slurm cluster for ML training
Fine-tuning large modelsA3 HighGKEMaximize GPU network bandwidth in Standard mode clusters
SlurmDeploy an A3 High Slurm cluster

Recommended consumption type

For fine-tuning workloads we recommend usingfuture reservation in calendar mode to provision resources. For moreinformation about consumption options, seeChoose a consumption option.

Recommended storage services

For fine tuning models, the amount of data needed can be significant especiallywhen it comes to read speeds for fine-tuning performance. We recommendfrequent and fast checkpointing of the model being fine-tuned.Similar to pre-training, for most use cases we recommend Google Cloud Managed Lustre.You can alternatively use Cloud Storage with Cloud Storage FUSE andAnywhere Cache enabled. For more information about storageoptions, seeStorage services.

Recommendations for inference

The following sections outline the accelerators, recommended consumption type,and storage service to use when performing inference.

Recommended accelerators

The recommended accelerators for inference depend on whether you're performingmulti-host frontier or large model inference, or single-host frontier inference.

Recommended accelerators (multi-host)

To perform multi-host frontier or large model inference on Google Cloud, werecommend using either anA4X,A4, orA3 accelerator-optimizedmachines and deploying these machines using an orchestrator. To deploy theseclusters of accelerators, we also recommend usingCluster Toolkit. To get you startedwith these clusters, a link to a cluster deployment guide for each recommendedmachine type is provided.

WorkloadsRecommendationsCluster deployment guide
Machine typeOrchestrator
Multi-host frontier inference
  • A4X
  • A4
  • A3 Ultra
GKECreate an AI-optimized GKE cluster with default configuration
SlurmCreate an AI-optimized Slurm cluster
Multi-host frontier inferenceA3 MegaGKEMaximize GPU network bandwidth in Standard mode clusters
SlurmDeploy an A3 Mega Slurm cluster for ML training
Large model inferenceA3 HighGKEMaximize GPU network bandwidth in Standard mode clusters
SlurmDeploy an A3 High Slurm cluster

Recommended accelerators (single host)

The following table outlines the recommended accelerators to use whenperforming single-host frontier inference. To get you started with these VMs,a link to a VM deployment guide for each recommended machine type is provided.

WorkloadsRecommendationsVM deployment guide
Machine typeOrchestrator
Single-host frontier inference
  • A4
  • A3 Ultra
N/ACreate an AI-optimized instance
Single-host frontier inferenceA3 HighN/ACreate an A3 VM with GPUDirect-TCPX enabled

Recommended consumption type

For inferencing we recommend using either along running reservation or future reservation in calendar mode. For moreinformation about consumption options, seeChoose a consumption option.

Recommended storage services

For inference, quickly loading the inference binaries and weights across manyservers requires fast data reads. We recommend that you use Cloud Storagewith Cloud Storage FUSE and Anywhere Cache enabled for model loading.Anywhere Cache provides a zonal data caching solution that acceleratesmodel load times and also reduces network egress fees. When paired withCloud Storage FUSE, Anywhere Cache is particularly useful for loading modelsacross multiple zones and multi-regions. If you are usingGoogle Cloud Managed Lustre for training, we recommend that you also useGoogle Cloud Managed Lustre for model loading as it enables fast data reads and is apersistent zonal storage solution. For more information about storage options,seeStorage services.

Recommendations for HPC

For HPC workloads, anyaccelerator-optimized machine series orcompute-optimized machine series works well.If using an accelerator-optimized machine series, the best fit depends on theamount of computation that must be offloaded to the GPU. To get a detailed listof recommendations forHPC workloads, seeBest practices for running HPC workloads.

To deploy HPC environments, a wider array of cluster blueprints are available.To get started, seeCluster blueprint catalog.

Summary of recommendations

The following is a summary of the recommendations for which accelerator,consumption type, and storage service we recommend for different workloads


Resource

Recommendation
Model pre-training
Machine familyUse one of the following accelerator-optimized machine types: A4, A3 Ultra, A3 Mega, or A3 High
Consumption typeUse reservations
StorageUse a Google Cloud managed service such as Google Cloud Managed Lustre or Cloud Storage FUSE
Model fine-tuning
Machine familyUse one of the following accelerator-optimized machine types: A3 Mega or A3 High
Consumption typeUse reservations
StorageUse a Google Cloud managed service such as Google Cloud Managed Lustre or Cloud Storage FUSE
Inference
Machine familyUse one of the following accelerator-optimized machine types: A4, A3 Ultra, A3 Mega, or A3 High
Consumption typeUse reservations
StorageUse a Google Cloud managed service such as Google Cloud Managed Lustre or Cloud Storage FUSE
HPC
Seethe summary section of the Best practices for running HPC workloads

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.