Self-deployed Llama models

Llama is a collection of open models developed by Meta that you can fine-tuneand deploy on Vertex AI. Llama offers pre-trained and instruction-tunedgenerative text and multimodal models.

Llama 4

The Llama 4 family of models is a collection of multimodal models that use theMixture-of-Experts (MoE) architecture. By using the MoE architecture, modelswith very large parameter counts can activate a subset of those parameters forany given input, which leads to more efficient inferences. Additionally, Llama4 uses early fusion, which integrates text and vision information from theinitial processing stages. This method enables Llama 4 models to moreeffectively grasp complex, nuanced relationships between text and images.Model Garden on Vertex AI offers two Llama 4 models: Llama 4Scout and Llama 4 Maverick.

For more information, see theLlama4 model card inModel Garden or view theIntroducing Llama 4 on Vertex AIblog post.

Llama 4 Maverick

Llama 4 Maverick is the largest and most capable Llama 4 model, offeringindustry-leading capabilities on coding, reasoning, and image benchmarks. Itfeatures 17 billion active parameters out of 400 billion total parameters with128 experts. Llama 4 Maverick uses alternating dense and MoE layers, where eachtoken activates a shared expert plus one of the 128 routed experts. You can usethe model as a pretrained (PT) model or instruction-tuned (IT) model with FP8support. The model is pretrained on 200 languages and optimized for high-qualitychat interactions through a refined post-training pipeline.

Llama 4 Maverick is multimodal and has a 1M context length. It is suited foradvanced image captioning, analysis, precise image understanding, visualQ&A, creative text generation, general-purpose AI assistants, and sophisticatedchatbots requiring top-tier intelligence and image understanding.

Llama 4 Scout

Llama 4 Scout delivers state-of-the-art results for its size class with a large10 million token context window, outperforming previous Llama generations andother open and proprietary models on several benchmarks. It features 17 billionactive parameters out of the 109 billion total parameters with 16 experts and isavailable as a pretrained (PT) or instruction-tuned (IT) model. Llama 4 Scout issuited for retrieval tasks within long contexts and tasks that demand reasoningover large amounts of information, such as summarizing multiple large documents,analyzing extensive user interaction logs for personalization and reasoningacross large codebases.

Llama 3.3

Llama 3.3 is a text-only 70B instruction-tuned model that provides enhancedperformance relative to Llama 3.1 70B and to Llama 3.2 90B when used fortext-only applications. Moreover, for some applications, Llama 3.3 70Bapproaches the performance of Llama 3.1 405B.

For more information, see theLlama3.3 model card inModel Garden.

Llama 3.2

Llama 3.2 enables developers to build and deploy the latest generative AI modelsand applications that use Llama's capabilities to ignite new innovations,such as image reasoning. Llama 3.2 is also designed to be more accessible foron-device applications. The following list highlights Llama 3.2 features:

Offers a more private and personalized AI experience, with on-deviceprocessing for smaller models.
Offers models that are designed to be more efficient, with reducedlatency and improved performance, making them suitable for a wide range ofapplications.
Built on top of the Llama Stack, which makes building anddeploying applications easier. Llama Stack is a standardized interface forbuilding canonical toolchain components and agentic applications.
Supports vision tasks, with a new model architecture that integratesimage encoder representations into the language model.

The 1B and 3B models are lightweight text-only models that support on-device usecases such as multilingual local knowledge retrieval, summarization, andrewriting.

Llama 11B and 90B models are small and medium-sized multimodal models with imagereasoning. For example, they can analyze visual data from charts to provide moreaccurate responses and extract details from images to generate textdescriptions.

For more information, see theLlama3.2 model card inModel Garden.

Considerations

When using the 11B and 90B, there are no restriction when you sendtext-only prompts. However, if you include an image in your prompt, the imagemust be at beginning of your prompt, and you can include only one image. Youcannot, for example, include some text and then an image.

Llama 3.1

Llama 3.1 collection of multilingual large language models (LLMs) is acollection of pre-trained and instruction-tuned generative models in 8B, 70B and405B sizes (text in/text out). The Llama 3.1 instruction tuned text-only models(8B, 70B, 405B) are optimized for multilingual dialogue use cases and outperformmany of the available open source and closed chat models on common industrybenchmarks.

For more information, see theLlama3.1 model card inModel Garden.

Llama 3

The Llama 3 instruction-tuned models are a collection of LLMs optimized fordialogue use cases. Llama 3 models outperform many of the available open sourcechat models on common industry benchmarks.

For more information, see theLlama3 model card inModel Garden.

Llama 2

The Llama 2 LLMs is a collection of pre-trained and fine-tuned generative textmodels, ranging in size from 7B to 70B parameters.

For more information, see theLlama2 model card inModel Garden.

Code Llama

Meta's Code Llama models are designed for code synthesis,understanding, and instruction.

For more information, see theCodeLlama model card inModel Garden.

Llama Guard 3

Llama Guard 3 builds on the capabilities of Llama Guard 2, addingthree new categories: Defamation, Elections, and Code Interpreter Abuse.Additionally, this model is multilingual and has a prompt format that isconsistent with Llama 3 or later instruct models.

For more information, see theLlamaGuard model card inModel Garden.

Resources

For more information about Model Garden, seeExplore AI models in Model Garden.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.

Movatterモバイル変換

Self-deployed Llama models Stay organized with collections Save and categorize content based on your preferences.