Run AI inference on Cloud Run with GPUs Stay organized with collections Save and categorize content based on your preferences.
Use GPUs to run AI inference on Cloud Run. If you are new to AI concepts,seeGPUs for AI.GPUs are used to train and run AI models. This can give you more stableperformance with the ability to scale workloads depending on your overallutilization. See GPU support forservices,jobs, andworker poolsto learn more about GPU configurations.
Tutorials for services
- Run LLM inference on Cloud Run GPUs with Gemma 3 and Ollama
- Run Gemma 3 on Cloud Run
- Run LLM inference on Cloud Run GPUs with vLLM
- Run OpenCV on Cloud Run with GPU acceleration
- Run LLM inference on Cloud Run GPUs with Hugging Face Transformers.js
- Run LLM inference on Cloud Run GPUs with Hugging Face TGI
Tutorials for jobs
- Fine tune LLMs using GPUs with Cloud Run jobs
- Run batch inference using GPUs on Cloud Run jobs
- GPU-accelerated video transcoding with FFmpeg on Cloud Run jobs
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.