Vertex AI RAG Engine supported models

TheVPC-SC security controls and CMEK are supported by Vertex AI RAG Engine. Data residency and AXT security controls aren't supported.

This page lists Gemini models, self-deployed models, and models withmanaged APIs on Vertex AI that support Vertex AI RAG Engine.

Gemini models

The following models support Vertex AI RAG Engine:

Fine-tuned Gemini models are unsupported when the Geminimodels use Vertex AI RAG Engine.

Self-deployed models

Vertex AI RAG Engine supports all models inModel Garden.

Use Vertex AI RAG Engine with your self-deployed open model endpoints.

Replace the variables used in the code sample:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process your request.
  • ENDPOINT_ID: Your endpoint ID.

    # Create a model instance with your self-deployed open model endpointrag_model=GenerativeModel("projects/PROJECT_ID/locations/LOCATION/endpoints/ENDPOINT_ID",tools=[rag_retrieval_tool])

Models with managed APIs on Vertex AI

The models with managed APIs on Vertex AI that supportVertex AI RAG Engine include the following:

The following code sample demonstrates how to use the GeminiGenerateContent API to create a generative model instance. The model ID,/publisher/meta/models/llama-3.1-405B-instruct-maas, is found in themodel card.

Replace the variables used in the code sample:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process your request.
  • RAG_RETRIEVAL_TOOL: Your RAG retrieval tool.

    # Create a model instance with Llama 3.1 MaaS endpointrag_model=GenerativeModel("projects/PROJECT_ID/locations/LOCATION/publisher/meta/models/llama-3.1-405B-instruct-maas",tools=RAG_RETRIEVAL_TOOL)

The following code sample demonstrates how to use the OpenAI compatibleChatCompletions API to generate a model response.

Replace the variables used in the code sample:

  • PROJECT_ID: Your project ID.
  • LOCATION: The region to process your request.
  • MODEL_ID: LLM model for content generation. Forexample,meta/llama-3.1-405b-instruct-maas.
  • INPUT_PROMPT: The text sent to the LLM for contentgeneration. Use a prompt relevant to the documents inVertex AI Search.
  • RAG_CORPUS_ID: The ID of the RAG corpus resource.
  • ROLE: Your role.
  • USER: Your username.
  • CONTENT: Your content.

    # Generate a response with Llama 3.1 MaaS endpointresponse=client.chat.completions.create(model="MODEL_ID",messages=[{"ROLE":"USER","content":"CONTENT"}],extra_body={"extra_body":{"google":{"vertex_rag_store":{"rag_resources":{"rag_corpus":"RAG_CORPUS_ID"},"similarity_top_k":10}}}},)

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-17 UTC.