Vertex AI APIs for building search and RAG experiences

Vertex AI offers a suite of APIs to help you build Retrieval-AugmentedGeneration (RAG) applications or a search engine. This page introduces thoseAPIs.

Retrieval and generation

RAG is a methodology that enables Large Language Models (LLMs) to generateresponses that are grounded to your data source of choice. There are two stagesin RAG:

Retrieval: Getting the most relevant facts quickly can be a commonsearch problem. With RAG, you can quickly retrieve the facts that areimportant to generate an answer.
Generation: The retrieved facts are used by the LLM to generate agrounded response.

Vertex AI offers options for both stages to match a variety ofdeveloper needs.

Retrieval

Choose the best retrieval method for your needs:

Vertex AI Search: Vertex AI Search is aGoogle Search-quality information retrieval engine that can be acomponent of any generative AI application that uses your enterprise data.Vertex AI Search works as an out-of-the-box semantic & keywordsearch engine for RAG with the ability to process a variety of documenttypes and with connectors to a variety of source systems includingBigQuery and many third party systems.
For more information, seeVertex AI Search.
Build your own retrieval: If you want to build your semantic search, youcan rely on Vertex AI APIs for components of your custom RAGsystem. This suite of APIs provide high-quality implementations for documentparsing, embedding generation, vector search, and semantic ranking. Using theselower-level APIs gives you full flexibility on the design of your retrieverwhile at the same time offering accelerated time to market and high qualityby relying on lower-level Vertex AI APIs.
For more information, seeBuild your own Retrieval Augmented Generation.
Bring an existing retrieval: You can use your existing search as aretriever forgrounded generation.You can also use the Vertex APIs for RAGto upgrade your existing search to higher quality. For more information, seeGrounding overview.
Vertex AI RAG Engine: Vertex AI RAG Engineprovides a fully-managed runtime for RAG orchestration, which letsdevelopers build RAG for use in production and enterprise-ready contexts.
For more information, seeVertex AI RAG Engineoverview in the Generative AIon Vertex AI documentation.
Google Search: When you use Grounding withGoogle Search for your Gemini model, then Geminiuses Google Search and generates output that is grounded to therelevant search results. This retrieval method doesn't require managementand you get the world's knowledge available to Gemini.
For more information, seeGrounding withGoogle Searchin the Generative AI on Vertex AI documentation.

Generation

Choose the best generation method for your needs:

Ground with your data:Generate well-grounded answers to a user's query. The grounded generationAPI uses specialized, fine-tuned Gemini models and is an effectiveway to reduce hallucinations and provide responses grounded to your sourcesor third-party sources including references to grounding support content.
For more information, seeGenerate grounded answers with RAG.
You can also ground responses to your Vertex AI Search data usingGenerative AI on Vertex AI. For more information, seeGround with your data.
Ground with Google Search: Gemini is Google's most capablemodel and offers out-of-the-box grounding with Google Search. Youcan use it to build your fully-customized grounded generation solution.
For more information, seeGrounding with Google Search inthe Generative AI on Vertex AI documentation.
Model Garden: If you want full control and the model of your choice,you can use any of the models inVertex AI Model Garden for generation.

Build your own Retrieval Augmented Generation

Developing a custom RAG system for grounding offers flexibility and control atevery step of the process. Vertex AI offers a suite of APIs to help youcreate your own search solutions. Using those APIs gives you full flexibility onthe design of your RAG application while at the same time offering acceleratedtime to market and high quality by relying on these lower-levelVertex AI APIs.

The Document AI Layout Parser.The Document AI Layout Parser transforms documents in variousformats into structured representations, making content like paragraphs,tables, lists, and structural elements like headings, page headers, andfooters accessible, and creating context-aware chunks that facilitateinformation retrieval in a range of generative AI and discovery apps.
For more information, seeDocument AI Layout Parser in theDocument AI documentation.
Embeddings API: The Vertex AI embeddings APIs let you createembeddings for text or multimodal inputs. Embeddings are vectors offloating point numbers that are designed to capture the meaning of theirinput. You can use the embeddings to power semantic search using Vectorsearch.
For more information, seeText embeddings andMultimodal embeddings in the Generative AI onVertex AI documentation.
Vector Search. The retrieval engine is a key part of your RAGor search application. Vertex AI Vector Search is aretrieval engine that can search from billions of semantically similar orsemantically related items at scale, with high queries per second (QPS), highrecall, low latency, and cost efficiency. It can search over denseembeddings, and supports sparse embedding keyword search and hybrid search inPublic preview.
For more information, see:Overview of Vertex AIVector Search in theVertex AI documentation.
The ranking API.The ranking API takes in a list of documents and reranks those documentsbased on how relevant the documents are to a given query. Compared toembeddings that look purely at the semantic similarity of a document and aquery, the ranking API can give you a more precise score for how well adocument answers a given query.
For more information, seeImprove search and RAG quality with ranking API.
The grounded generation API. Use the groundedgeneration API to generatewell-grounded answers to a user's prompt. The grounding sources can be yourVertex AI Search data stores, custom data that you provide, orGoogle Search.
For more information, seeGenerate grounded answers.
The generate content API. Use the generate content API to generatewell-grounded answers to a user's prompt. The grounding sources can be yourVertex AI Search data stores or Google Search.
For more information, seeGround with Google Search orGround with your data.
The check grounding API.The check grounding API determines how grounded a given piece of text is in agiven set of reference texts. The API can generate supporting citations fromthe reference text to indicate where the given text is supported by thereference texts. Among other things, the API can be used to assess thegrounded-ness of responses from a RAG systems. Additionally, as anexperimental feature, the API also generates contradicting citations thatshow where the given text and reference texts disagree.
For more information, seeCheck grounding.

Workflow: Generate grounded responses from unstructured data

Here's a workflow that outlines how to integrate the Vertex AI RAG APIsto generate grounded responses from unstructured data.

Import your unstructured documents, such as PDF files, HTML files, or imageswith text, into a Cloud Storage location.
Process the imported documents using thelayout parser.The layout parser breaks down the unstructured documents into chunks andtransforms the unstructured content into its structured representation. Thelayout parser also extracts annotations from the chunks.
Create text embeddings for chunks usingVertex AI text embeddings API.
Index and retrieve the chunk embeddings usingVector Search.
Rank the chunks using the ranking API and determine the top-rankedchunks.
Generate grounded answers based on the top-ranked chunksusing thegrounded generation API orusing thegenerate content API.

If you generated the answers using an answer generation model other than theGoogle models, you cancheck the grounding of these answersusing the check grounding method.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-18 UTC.

Movatterモバイル変換