Run LLM inference on Cloud Run GPUs with vLLM Stay organized with collections Save and categorize content based on your preferences.
The following codelab shows how to run a backend service that runsvLLM, which is aninference engine for production systems, along with Google'sGemma 2, which isa 2 billion parameters instruction-tuned model.
See the entire codelab atRun LLM inference on Cloud Run GPUs with vLLM.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-18 UTC.