AI workloads tutorials overview

To help you run proof-of-concept (POC) AI/ML workloads, this page provides anoverview of AI Hypercomputer tutorials that describe the completeprocess of deploying common AI models on Google Cloud products.

These tutorials are designed for machine learning (ML) engineers, researchers,platform administrators and operators, and data and AI specialists. To use thesetutorials effectively, you should have a foundational understanding of machinelearning concepts and proficiency with Google Cloud services. Experience withdeploying and managing AI models also helps you understand this content.

Tutorial categories

The AI workload tutorials are organized into the following categories:

Run inference with vLLM on GKE
Run fine-tuning
Run training

Run inference with vLLM on Google Kubernetes Engine

These tutorials describe how to deploy and serve large language models (LLMs)for inference using the vLLM serving framework on Google Kubernetes Engine(GKE). You learn to use GKE's containerorchestration capabilities for efficient inference workloads. These tutorials coveraccessing models using Hugging Face, setting up GKE clusters(for example, in Autopilot mode), handling credentials, and deployingvLLM containers for interaction with LLMs such as Gemma 3, Llama 4, andQwen3.

Run fine-tuning

These tutorials describe how to fine-tune LLMs for specific tasks across variousGoogle Cloud cluster types, including GKE and Slurm. For example, you can fine-tune Gemma 3 on multi-node and multi-GPUGKE clusters (for example, using A4 VM instances with NVIDIA B200GPUs) and Slurm clusters. You will create custom VM images, configure RDMAnetworks, and execute distributed fine-tuning jobs with libraries like HuggingFace Accelerate and FSDP. Some tutorials also cover using frameworks like Ray forvision-related tasks.

Run training

These tutorials describe how to train or pre-train LLMs on high-performanceclusters. For example, you learn to pre-train models like Qwen2 on multi-node and multi-GPU Slurm clusters with A4 virtual machines. You deploy Slurmclusters using the Google Cloud Cluster Toolkit, create customVM images, configure shared Filestore instances, configure high-speedRDMA networking, and run distributed pre-training jobs with Hugging FaceAccelerate.

What's next

Explore the AI Hypercomputer tutorials:

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換