RAG quickstart

TheVPC-SC security controls and CMEK are supported by Vertex AI RAG Engine. Data residency and AXT security controls aren't supported.

Important: By default, a RAG corpus usesRagManagedDb. For more information, seeUnderstanding RagManagedDb.

This page shows you how to use the Vertex AI SDK to runVertex AI RAG Engine tasks.

You can also follow along using this notebookIntro to Vertex AI RAG Engine.

Required roles

Grant roles to your user account. Run the following command once for each of the following IAM roles:roles/aiplatform.user

gcloudprojectsadd-iam-policy-bindingPROJECT_ID--member="user:USER_IDENTIFIER"--role=ROLE

Replace the following:

  • PROJECT_ID: Your project ID.
  • USER_IDENTIFIER: The identifier for your user account. For example,myemail@example.com.
  • ROLE: The IAM role that you grant to your user account.

Prepare your Google Cloud console

To use Vertex AI RAG Engine, do the following:

  1. Install the Vertex AI SDK for Python.

  2. Run this command in the Google Cloud console to set up your project.

    gcloud config set project {project}

  3. Run this command to authorize your login.

    gcloud auth application-default login

Run Vertex AI RAG Engine

Copy and paste this sample code into the Google Cloud console to run Vertex AI RAG Engine.

Python

To learn how to install or update the Vertex AI SDK for Python, seeInstall the Vertex AI SDK for Python. For more information, see thePython API reference documentation.

fromvertexaiimportragfromvertexai.generative_modelsimportGenerativeModel,Toolimportvertexai# Create a RAG Corpus, Import Files, and Generate a response# TODO(developer): Update and un-comment below lines# PROJECT_ID = "your-project-id"# display_name = "test_corpus"# paths = ["https://drive.google.com/file/d/123", "gs://my_bucket/my_files_dir"]  # Supports Google Cloud Storage and Google Drive Links# Initialize Vertex AI API once per sessionvertexai.init(project=PROJECT_ID,location="us-east4")# Create RagCorpus# Configure embedding model, for example "text-embedding-005".embedding_model_config=rag.RagEmbeddingModelConfig(vertex_prediction_endpoint=rag.VertexPredictionEndpoint(publisher_model="publishers/google/models/text-embedding-005"))rag_corpus=rag.create_corpus(display_name=display_name,backend_config=rag.RagVectorDbConfig(rag_embedding_model_config=embedding_model_config),)# Import Files to the RagCorpusrag.import_files(rag_corpus.name,paths,# Optionaltransformation_config=rag.TransformationConfig(chunking_config=rag.ChunkingConfig(chunk_size=512,chunk_overlap=100,),),max_embedding_requests_per_min=1000,# Optional)# Direct context retrievalrag_retrieval_config=rag.RagRetrievalConfig(top_k=3,# Optionalfilter=rag.Filter(vector_distance_threshold=0.5),# Optional)response=rag.retrieval_query(rag_resources=[rag.RagResource(rag_corpus=rag_corpus.name,# Optional: supply IDs from `rag.list_files()`.# rag_file_ids=["rag-file-1", "rag-file-2", ...],)],text="What is RAG and why it is helpful?",rag_retrieval_config=rag_retrieval_config,)print(response)# Enhance generation# Create a RAG retrieval toolrag_retrieval_tool=Tool.from_retrieval(retrieval=rag.Retrieval(source=rag.VertexRagStore(rag_resources=[rag.RagResource(rag_corpus=rag_corpus.name,# Currently only 1 corpus is allowed.# Optional: supply IDs from `rag.list_files()`.# rag_file_ids=["rag-file-1", "rag-file-2", ...],)],rag_retrieval_config=rag_retrieval_config,),))# Create a Gemini model instancerag_model=GenerativeModel(model_name="gemini-2.0-flash-001",tools=[rag_retrieval_tool])# Generate responseresponse=rag_model.generate_content("What is RAG and why it is helpful?")print(response.text)# Example response:#   RAG stands for Retrieval-Augmented Generation.#   It's a technique used in AI to enhance the quality of responses# ...

curl

  1. Create a RAG corpus.

      export LOCATION=LOCATION  export PROJECT_ID=PROJECT_ID  export CORPUS_DISPLAY_NAME=CORPUS_DISPLAY_NAME  // CreateRagCorpus  // Output: CreateRagCorpusOperationMetadata  curl -X POST \  -H "Authorization: Bearer $(gcloud auth print-access-token)" \  -H "Content-Type: application/json" \  https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/ragCorpora \  -d '{        "display_name" : "'"CORPUS_DISPLAY_NAME"'"    }'

    For more information, seeCreate a RAG corpusexample.

  2. Import a RAG file.

      // ImportRagFiles  // Import a single Cloud Storage file or all files in a Cloud Storage bucket.  // Input: LOCATION, PROJECT_ID, RAG_CORPUS_ID, GCS_URIS  export RAG_CORPUS_ID=RAG_CORPUS_ID  export GCS_URIS=GCS_URIS  export CHUNK_SIZE=CHUNK_SIZE  export CHUNK_OVERLAP=CHUNK_OVERLAP  export EMBEDDING_MODEL_QPM_RATE=EMBEDDING_MODEL_QPM_RATE  // Output: ImportRagFilesOperationMetadataNumber  // Use ListRagFiles, or import_result_sink to get the correct rag_file_id.  curl -X POST \  -H "Authorization: Bearer $(gcloud auth print-access-token)" \  -H "Content-Type: application/json" \  https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/ragCorpora/RAG_CORPUS_ID/ragFiles:import \  -d '{    "import_rag_files_config": {      "gcs_source": {        "uris": "GCS_URIS"      },      "rag_file_chunking_config": {        "chunk_size":CHUNK_SIZE,        "chunk_overlap":CHUNK_OVERLAP      },      "max_embedding_requests_per_min":EMBEDDING_MODEL_QPM_RATE    }  }'

    For more information, seeImport RAG filesexample.

  3. Run a RAG retrieval query.

      export RAG_CORPUS_RESOURCE=RAG_CORPUS_RESOURCE  export VECTOR_DISTANCE_THRESHOLD=VECTOR_DISTANCE_THRESHOLD  export SIMILARITY_TOP_K=SIMILARITY_TOP_K  {  "vertex_rag_store": {      "rag_resources": {        "rag_corpus": "RAG_CORPUS_RESOURCE"      },      "vector_distance_threshold":VECTOR_DISTANCE_THRESHOLD    },    "query": {    "text": TEXT    "similarity_top_k":SIMILARITY_TOP_K    }  }  curl -X POST \      -H "Authorization: Bearer $(gcloud auth print-access-token)" \      -H "Content-Type: application/json; charset=utf-8" \      -d @request.json \      "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION:retrieveContexts"

    For more information, seeRAG EngineAPI.

  4. Generate content.

    {"contents": {  "role": "USER",  "parts": {    "text": "INPUT_PROMPT"  }},"tools": {  "retrieval": {  "disable_attribution": false,  "vertex_rag_store": {    "rag_resources": {      "rag_corpus": "RAG_CORPUS_RESOURCE"    },    "similarity_top_k": "SIMILARITY_TOP_K",    "vector_distance_threshold":VECTOR_DISTANCE_THRESHOLD  }  }}}curl -X POST \    -H "Authorization: Bearer $(gcloud auth print-access-token)" \    -H "Content-Type: application/json; charset=utf-8" \    -d @request.json \    "https://LOCATION-aiplatform.googleapis.com/v1/projects/PROJECT_ID/locations/LOCATION/publishers/google/models/MODEL_ID:GENERATION_METHOD"

    For more information, seeRAG EngineAPI.

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.