Movatterモバイル変換

Skip to main content

Run LLM inference on Cloud Run GPUs with vLLM

The following codelab shows how to run a backend service that runsvLLM, which is aninference engine for production systems, along with Google'sGemma 2, which isa 2 billion parameters instruction-tuned model.

See the entire codelab atRun LLM inference on Cloud Run GPUs with vLLM.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.

[8]ページ先頭

©2009-2026 Movatter.jp