AI workloads tutorials overview Stay organized with collections Save and categorize content based on your preferences.
To help you run proof-of-concept (POC) AI/ML workloads, this page provides anoverview of AI Hypercomputer tutorials that describe the completeprocess of deploying common AI models on Google Cloud products.
These tutorials are designed for machine learning (ML) engineers, researchers,platform administrators and operators, and data and AI specialists. To use thesetutorials effectively, you should have a foundational understanding of machinelearning concepts and proficiency with Google Cloud services. Experience withdeploying and managing AI models also helps you understand this content.
Tutorial categories
The AI workload tutorials are organized into the following categories:
- Run inference with vLLM on GKE
- Run fine-tuning
- Run training
Run inference with vLLM on Google Kubernetes Engine
These tutorials describe how to deploy and serve large language models (LLMs)for inference using the vLLM serving framework on Google Kubernetes Engine(GKE). You learn to use GKE's containerorchestration capabilities for efficient inference workloads. These tutorials coveraccessing models using Hugging Face, setting up GKE clusters(for example, in Autopilot mode), handling credentials, and deployingvLLM containers for interaction with LLMs such as Gemma 3, Llama 4, andQwen3.
Run fine-tuning
These tutorials describe how to fine-tune LLMs for specific tasks across variousGoogle Cloud cluster types, including GKE and Slurm. For example, you can fine-tune Gemma 3 on multi-node and multi-GPUGKE clusters (for example, using A4 VM instances with NVIDIA B200GPUs) and Slurm clusters. You will create custom VM images, configure RDMAnetworks, and execute distributed fine-tuning jobs with libraries like HuggingFace Accelerate and FSDP. Some tutorials also cover using frameworks like Ray forvision-related tasks.
Run training
These tutorials describe how to train or pre-train LLMs on high-performanceclusters. For example, you learn to pre-train models like Qwen2 on multi-node and multi-GPU Slurm clusters with A4 virtual machines. You deploy Slurmclusters using the Google Cloud Cluster Toolkit, create customVM images, configure shared Filestore instances, configure high-speedRDMA networking, and run distributed pre-training jobs with Hugging FaceAccelerate.
What's next
Explore the AI Hypercomputer tutorials:
- Use vLLM on GKE to serve Gemma 3 27B inference
- Fine-tune Gemma 3 on an A4 GKE cluster
- Train Qwen2 on an A4 Slurm cluster
- Serve Qwen2-72B with vLLM on TPUs
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2025-12-15 UTC.