Use Vertex AI RAG Engine in Gemini Live API Stay organized with collections Save and categorize content based on your preferences.
Experimental
Some of the RAG features are Experimental offerings, subject to the "Pre-GA Offerings Terms" of theGoogle Cloud Service Specific Terms. Pre-GA products and features are available "as-is" and may have limited support, and changes to Pre-GA products and features may not be compatible with other Pre-GA versions. For more information, see the launch stage descriptions.
TheVPC-SC security controls and CMEK are supported by Vertex AI RAG Engine. Data residency and AXT security controls aren't supported.
Retrieval-augmented generation (RAG) is a technique that's used to retrieve andprovide relevant information to LLMs to generate verifiable responses. Theinformation can include fresh information, a topic and context, or ground truth.
This page shows you how to use Vertex AI RAG Engine with theGemini Live API, which lets you specify and retrieveinformation from the RAG corpus.
Prerequisites
The following prerequisites must be completed before you can use Vertex AI RAG Engine with the multimodal Live API:
Enable the RAG API in Vertex AI.
To upload files to the RAG Corpus, seeImport RAG files exampleAPI.
Set up
You can use Vertex AI RAG Engine with the Live API by specifyingVertex AI RAG Engine as a tool. The following code sampledemonstrates how to specify Vertex AI RAG Engine as a tool:
Replace the following variables:
- YOUR_PROJECT_ID: The ID of your Google Cloud project.
- YOUR_CORPUS_ID: The ID of your corpus.
- YOUR_LOCATION: The region to process the request.
PROJECT_ID="YOUR_PROJECT_ID"RAG_CORPUS_ID="YOUR_CORPUS_ID"LOCATION="YOUR_LOCATION"TOOLS={"retrieval":{"vertex_rag_store":{"rag_resources":{"rag_corpus":"projects/${PROJECT_ID}/locations/${LOCATION}/ragCorpora/${RAG_CORPUS_ID}"}}}}UseWebsocket for real-time communication
To enable real-time communication between a client and a server, you must use aWebsocket. These code samples demonstrate how to use aWebsocket using thePython API and the Python SDK.
Python API
CONFIG={"response_modalities":["TEXT"],"speech_config":{"language_code":"en-US"}}headers={"Content-Type":"application/json","Authorization":f"Bearer{bearer_token[0]}",}HOST="${LOCATION}-aiplatform.googleapis.com"SERVICE_URL=f"wss://{HOST}/ws/google.cloud.aiplatform.v1beta1.LlmBidiService/BidiGenerateContent"MODEL="gemini-2.0-flash-exp"# Connect to the serverasyncwithconnect(SERVICE_URL,additional_headers=headers)asws:# Setup the sessionawaitws.send(json.dumps({"setup":{"model":MODEL,"generation_config":CONFIG,# Setup RAG as a retrieval tool"tools":TOOLS,}}))# Receive setup responseraw_response=awaitws.recv(decode=False)setup_response=json.loads(raw_response.decode("ascii"))# Send text messagetext_input="What are popular LLMs?"display(Markdown(f"**Input:**{text_input}"))msg={"client_content":{"turns":[{"role":"user","parts":[{"text":text_input}]}],"turn_complete":True,}}awaitws.send(json.dumps(msg))responses=[]# Receive chunks of server responseasyncforraw_responseinws:response=json.loads(raw_response.decode())server_content=response.pop("serverContent",None)ifserver_contentisNone:breakmodel_turn=server_content.pop("modelTurn",None)ifmodel_turnisnotNone:parts=model_turn.pop("parts",None)ifpartsisnotNone:display(Markdown(f"**parts >**{parts}"))responses.append(parts[0]["text"])# End of turnturn_complete=server_content.pop("turnComplete",None)ifturn_complete:grounding_metadata=server_content.pop("groundingMetadata",None)ifgrounding_metadataisnotNone:grounding_chunks=grounding_metadata.pop("groundingChunks",None)ifgrounding_chunksisnotNone:forchunkingrounding_chunks:display(Markdown(f"**grounding_chunk >**{chunk}"))break# Print the server responsedisplay(Markdown(f"**Response >**{''.join(responses)}"))Python SDK
To learn how to install the generative AI SDK, seeInstall a library:
fromgoogleimportgenaifromgoogle.genaiimporttypesfromgoogle.genai.typesimport(Content,LiveConnectConfig,HttpOptions,Modality,Part,)fromIPythonimportdisplayMODEL="gemini-2.0-flash-exp"client=genai.Client(vertexai=True,project=PROJECT_ID,location=LOCATION)asyncwithclient.aio.live.connect(model=MODEL,config=LiveConnectConfig(response_modalities=[Modality.TEXT],tools=TOOLS),)assession:text_input="\'What are core LLM techniques?\'"print("> ",text_input,"\n")awaitsession.send_client_content(turns=Content(role="user",parts=[Part(text=text_input)]))asyncformessageinsession.receive()ifmessage.text:display.display(display.Markdown(message.text))continueUse Vertex AI RAG Engine as the context store
You can use Vertex AI RAG Engine as the context store forGemini Live API to store session context to form and retrieve pastcontexts that are related to your conversation and enrich the current contextfor model generation. You can also take advantage of this feature to sharecontexts across your different Live API sessions.
Vertex AI RAG Engine supports storing and indexing the followingforms of data from session contexts:
- Text
- Audio speech
Create a MemoryCorpus type corpus
To store and index conversation texts from the session context, you must createa RAG corpus of theMemoryCorpus type. You must also specify an LLM parser inyour memory corpus configuration that's used to parse session contexts storedfrom the Live API to build memory for indexing.
This code sample demonstrates how to create a corpus. However, first replace thevariables with values.
# Currently supports Google first-party embedding modelsEMBEDDING_MODEL=YOUR_EMBEDDING_MODEL# Such as "publishers/google/models/text-embedding-005"MEMORY_CORPUS_DISPLAY_NAME=YOUR_MEMORY_CORPUS_DISPLAY_NAMELLM_PARSER_MODEL_NAME=YOUR_LLM_PARSER_MODEL_NAME# Such as "projects/{project_id}/locations/{location}/publishers/google/models/gemini-2.5-pro-preview-05-06"memory_corpus=rag.create_corpus(display_name=MEMORY_CORPUS_DISPLAY_NAME,corpus_type_config=rag.RagCorpusTypeConfig(corpus_type_config=rag.MemoryCorpus(llm_parser=rag.LlmParserConfig(model_name=LLM_PARSER_MODEL_NAME,))),backend_config=rag.RagVectorDbConfig(rag_embedding_model_config=rag.RagEmbeddingModelConfig(vertex_prediction_endpoint=rag.VertexPredictionEndpoint(publisher_model=EMBEDDING_MODEL))),)Specify your memory corpus to store contexts
When using your memory corpus with the Live API, you must specify the memorycorpus as a retrieval tool and then setstore_context totrue to allow theLive API to store the session contexts.
This code sample demonstrates how to specify your memory corpus to storecontexts. However, first replace the variables with values.
fromgoogleimportgenaifromgoogle.genaiimporttypesfromgoogle.genai.typesimport(Content,LiveConnectConfig,HttpOptions,Modality,Part)fromIPythonimportdisplayPROJECT_ID=YOUR_PROJECT_IDLOCATION=YOUR_LOCATIONTEXT_INPUT=YOUR_TEXT_INPUTMODEL_NAME=YOUR_MODEL_NAME# Such as "gemini-2.0-flash-exp"client=genai.Client(vertexai=True,project=PROJECT_ID,location=LOCATION,)memory_store=types.VertexRagStore(rag_resources=[types.VertexRagStoreRagResource(rag_corpus=memory_corpus.name)],store_context=True)asyncwithclient.aio.live.connect(model=MODEL_NAME,config=LiveConnectConfig(response_modalities=[Modality.TEXT],tools=[types.Tool(retrieval=types.Retrieval(vertex_rag_store=memory_store))]),)assession:text_input=TEXT_INPUTawaitsession.send_client_content(turns=Content(role="user",parts=[Part(text=text_input)]))asyncformessageinsession.receive():ifmessage.text:display.display(display.Markdown(message.text))continueWhat's next
- To learn more about Vertex AI RAG Engine, seeVertex AI RAG Engine overview.
- To learn more about the RAG API, seeVertex AI RAG EngineAPI.
- To manage your RAG corpora, seeCorpusmanagement.
- To manage your RAG files, seeFilemanagement.
- To learn how to use the Vertex AI SDK to runVertex AI RAG Engine tasks, seeRAG quickstart forPython.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.