Introduction to AI/ML workloads on GKE

This page provides a conceptual overview of Google Kubernetes Engine (GKE) forAI/ML workloads. GKE is a Google-managed implementation of theKubernetes open source container orchestration platform.

Google Kubernetes Engineprovides a scalable, flexible, and cost-effective platform for running all yourcontainerized workloads, including artificial intelligence andmachine learning (AI/ML) applications. Whether you're training large foundationmodels, serving inference requests at scale, or building a comprehensiveAI platform, GKE offers the control and performance youneed.

This page is for Data and AI specialists, Cloud architects,Operators, and Developers who are looking for ascalable, automated, managed Kubernetes solution to run AI/ML workloads. Tolearn more about common roles, seeCommon GKE user roles and tasks.

Get started with AI/ML workloads on GKE

You can start exploring GKE in minutes by using GKE'sfree tier,which lets you get started with Kubernetes without incurring costs for clustermanagement.

  1. Get started in Google Cloud console

  2. Try these quickstarts:
    • Inference on GKE: deploy an AI large language model (LLM) on GKE for inference using a pre-defined architecture.
    • Training on GKE: deploy an AI training model on GKEand store the predictions in Cloud Storage.
  3. ReadAbout accelerator consumption options for AI/ML workloads, which has guidance and resources for planningand obtaining accelerators (GPUs and TPUs) for your platform.

Common use cases

GKE provides a unified platform that can support all of yourAI workloads.

  • Building an AI platform: for enterprise platform teams,GKE provides the flexibility to build a standardized, multi-tenantplatform that serves diverse needs.
  • Low-latency online serving: For developers building generative AIapplications, GKE with the Inference Gateway provides theoptimized routing and autoscaling needed to deliver a responsive user experiencewhile controlling costs.

Choose the right platform for your AI/ML workload

Google Cloud offers a spectrum of AI infrastructure products to support yourML journey, from fully managed to fully configurable. Choosing the rightplatform depends on your specific needs for control, flexibility, and level ofmanagement.

Best practice:

Choose GKE when you need deep control, portability, and theability to build a customized, high-performance AI platform.

  • Infrastructure control and flexibility: you require a high degree ofcontrol over your infrastructure, need to use custom pipelines, or requirekernel-level customizations.
  • Large-scale training and inference: you want to train very large modelsor serve models with minimal latency, by using GKE'sscaling and high performance.
  • Cost efficiency at scale: you want to prioritize cost optimization byusing GKE's integration with Spot VMs and Flex-start VMsto effectively manage costs.
  • Portability and open standards: you want to avoid vendorlock-in and run your workloads anywhere with Kubernetes, and you already haveexisting Kubernetes expertise or a multi-cloud strategy.

You can also consider these alternatives:

Google Cloud serviceBest for
Vertex AIA fully managed, end-to-end platform to accelerate development and offload infrastructure management. Works well for teams focused on MLOps and rapid time-to-value. For more information, watchChoosing between self-hosted GKE and managed Vertex AI to host AI models.
Cloud RunA serverless platform for containerized inference workloads that can scale to zero. Works well for event-driven applications and serving smaller models cost-effectively. For a comparative deep-dive, seeGKE and Cloud Run.

How GKE powers AI/ML workloads

GKE offers a suite of specialized components that simplify andaccelerate each stage of the AI/ML lifecycle, from large-scale training tolow-latency inference.

In the following diagram, GKE is within Google Cloud       and can use different cloud storage options (such as Cloud Storage FUSE and Managed Lustre) and different cloud infrastructure options       (such as Cloud TPU and Cloud GPUs). GKE also works       with open source software and frameworks for deep learning (such as JAX or TensorFlow), ML orchestration (such as Jupyter or Ray), and LLM inference       (such as vLLM or NVIDIA Dynamo.
Figure 1: GKE as a scalable managed platform for AI/ML workloads.

The following table summarizes the GKE features that supportyour AI/ML workloads or operational goals.

AI/ML workload or operationHow GKE supports youKey features
Inference and servingOptimized to serve AI models elastically, with low latency, high throughput,and cost efficiency.
  • Accelerator flexibility: GKE supports bothGPUsandTPUs for inference.
  • GKE Inference Gateway: a model-aware gateway that provides intelligent routing and load balancing specifically for AI inference workloads.
  • GKE Inference Quickstart: a tool to simplify performance analysis and deployment by providing a set of benchmarked profiles for popular AI models.
  • GKE Autopilot: a GKE operational mode that automates cluster operations andcapacity right-sizing, reducing overhead.
Training and fine-tuningProvides the scale and orchestration capabilities necessary to efficiently train very large models while minimizing costs.
  • Faster startup nodes: an optimization designed specifically for GPU workloads that reduces node startup times by up to 80%.
  • flex-start provisioning mode powered by Dynamic Workload Scheduler: improves your ability to secure scarce GPU and TPU accelerators for short-duration training workloads.
  • Kueue: a Kubernetes-native job queueing system that manages resource allocation, scheduling, quota management, and prioritization for batch workloads.
  • TPU multislice:a hardware and networking architecture that allows multiple TPU slices tocommunicate with each other over the Data Center Network (DCN) to achieve largescale training.
Unified AI/ML developmentManaged support for Ray, an open-source framework for scaling distributed Python applications.
  • Ray on GKE add-on: abstracts Kubernetes infrastructure, lettingyou scale workloads like large-scale data preprocessing, distributed training,and online serving with minimal code changes.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.