Improve search and RAG quality with ranking API

As part of your Retrieval Augmented Generation (RAG) experience in Vertex AI Search, you canrank a set of documents based on a query.

The ranking API takes a list of documents and reranks those documents basedon how relevant the documents are to a query. Compared to embeddings, which lookonly at the semantic similarity of a document and a query, the ranking API cangive you precise scores for how well a document answers a given query. Theranking API can be used to improve the quality of search results afterretrieving an initial set of candidate documents.

The ranking API is stateless so there's no need to index documentsbefore calling the API. All you need to do is pass in the query and documents.This makes the API well suited for reranking documents fromVector Search and other search solutions.

This page describes how to use the ranking API to rank a set ofdocuments based on a query.

Use cases

The primary use case of the ranking API is to improve the quality of searchresults.

However, the ranking API can be valuable for any scenario where you need to findwhat pieces of content are most relevant to a user's query. For example, theranking API can assist you in the following:

  • Finding the right content to give to an LLM for grounding

  • Improving the relevance of an existing search experience

  • Identifying relevant sections of a document

The following flow outlines how you might use the ranking API to improve thequality of results for chunked documents:

  1. Use Document AI Layout Parser API to split a set of documents into chunks.

  2. Use an embeddings API to create embeddings for each of the chunks.

  3. Load the embeddings into Vector Search or another search solution.

  4. Query your search index and retrieve the most relevant chunks.

  5. Rerank the relevant chunks using the ranking API.

Input data

Key Term: The ranking API uses the termrecord to indicate a document. Arecord is made up of an ID, a title, and the content of a document. Unlike thedocuments that are contained in Vertex AI Search data stores,the records input to the ranking API are in JSON format and have notbeen indexed by Vertex AI Search.

The ranking API requires the following inputs:

  • The query for which you're ranking the records.

    For example:

    "query":"Why is the sky blue?"
  • A set of records that are relevant to the query. The records are provided asan array of objects. Each record can include a unique ID, a title, and thecontent of the document. For each record include either a title, content,or both. The maximum supported tokens per record depends on themodel version being used. For example, models up to version003 support 512 tokens, while version004 supports 1024 tokens. If the combined length of the title and content exceeds the model's token limit, the extra content is truncated. You can include up to 200 records per request.

    For example, a record array looks something like this. In reality, manymore records would be included in the array and the content would be muchlonger:

    "records":[{"id":"1","title":"The Color of the Sky: A Poem","content":"A canvas stretched across the day,\nWhere sunlight learns to dance and play.\nBlue, a hue of scattered light,\nA gentle whisper, soft and bright."},{"id":"2","title":"The Science of a Blue Sky","content":"The sky appears blue due to a phenomenon called Rayleigh scattering. Sunlight is comprised of all the colors of the rainbow. Blue light has shorter wavelengths than other colors, and is thus scattered more easily."}]
  • Optional: The maximum number of records that you want the ranking API toreturn. By default, all records are returned; however, you can use thetopNfield to return fewer records. All records are ranked regardless of whatvalue is set.

    For example, this returns the top 10 ranked records:

    "topN":10,
  • Optional: A setting that specifies whether you want just the ID of the recordreturned by the API or if you want the record title and content returned aswell. By default, the full record is returned. The main reason to set this isif you want to reduce the size of the response payload.

    For example, setting totrue returns only the record ID, not the title orcontent:

    "ignoreRecordDetailsInResponse":true,
  • Optional: The model name. This specifies the model to be used for ranking thedocuments. If no model is specified, thensemantic-ranker-default@latest isused, which automatically points to the latest available model. To point to aspecific model, specify one of the model names listed inSupportedmodels, for examplesemantic-ranker-512-003.

    In the following example,model is set tosemantic-ranker-default@latest.This means that the ranking API will always use the latest available model.

    "model":"semantic-ranker-default@latest"

Output data

The ranking API returns a ranked list of records with following outputs:

  • Score: a float value between 0 and 1 that indicates relevance of the record.

  • ID: the unique ID of the record.

  • If requested, the full object: the ID, title, and content.

    For example:

{"records":[{"id":"2","score":0.98,"title":"The Science of a Blue Sky","content":"The sky appears blue due to a phenomenon called Rayleigh scattering. Sunlight is comprised of all the colors of the rainbow. Blue light has shorter wavelengths than other colors, and is thus scattered more easily."},{"id":"1","score":0.64,"title":"The Color of the Sky: A Poem","content":"A canvas stretched across the day,\nWhere sunlight learns to dance and play.\nBlue, a hue of scattered light,\nA gentle whisper, soft and bright."}]}

Rank (or rerank) a set of records according to a query

Typically, you'll supply the ranking API with a query and a set of recordsthat are relevant to that query and have already been ranked by some other method such as a keyword search or a vector search. Then, you use the ranking API to improve the quality of the ranking and determine a score that indicates the relevance of each record to the query.

  1. Obtain the query and resulting records. Ensure that each record has an ID andeither a title, content, or both.

    The maximum number of supported tokens per record depends on the model version. Models up to version003, such assemantic-ranker-512-003, support 512 tokens per record. Starting from version004, this limit increases to 1024 tokens. If the combined length of the title and content exceeds the model's token limit, the extra content is truncated.

  2. Call therankingConfigs.rank method using the following code:

REST

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project:PROJECT_ID"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/rankingConfigs/default_ranking_config:rank"\-d'{"model": "semantic-ranker-default@latest","query": "QUERY","records": [    {        "id": "RECORD_ID_1",        "title": "TITLE_1",        "content": "CONTENT_1"    },    {        "id": "RECORD_ID_2",        "title": "TITLE_2",        "content": "CONTENT_2"    },    {        "id": "RECORD_ID_3",        "title": "TITLE_3",        "content": "CONTENT_3"    }]}'

Replace the following:

  • PROJECT_ID: the ID of your Google Cloud project..
  • QUERY: the query against which the records are ranked and scored.
  • RECORD_ID_n: a unique string that identifies the record.
  • TITLE_n: the title of the record.
  • CONTENT_n: the content of the record.
Note: You can test the API by typing your own values into these placeholdersand pasting the command into Cloud Shell. However, in reality,therecords array would be larger.

For general information about this method, seerankingConfigs.rank.

Click for an example curl command and response.

curl-XPOST\-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\-H"X-Goog-User-Project: my-project-123"\"https://discoveryengine.googleapis.com/v1/projects/my-project-123/locations/global/rankingConfigs/default_ranking_config:rank"\-d'{        "model": "semantic-ranker-default@latest",        "query": "what is Google gemini?",        "records": [            {                "id": "1",                "title": "Gemini",                "content": "The Gemini zodiac symbol often depicts two figures standing side-by-side."            },            {                "id": "2",                "title": "Gemini",                "content": "Gemini is a cutting edge large language model created by Google."            },            {                "id": "3",                "title": "Gemini Constellation",                "content": "Gemini is a constellation that can be seen in the night sky."            }        ]    }'
{"records":[{"id":"2","title":"Gemini","content":"Gemini is a cutting edge large language model created by Google.","score":0.97},{"id":"3","title":"Gemini Constellation","content":"Gemini is a constellation that can be seen in the night sky.","score":0.18},{"id":"1","title":"Gemini","content":"The Gemini zodiac symbol often depicts two figures standing side-by-side.","score":0.05}]}

Python

For more information, see theVertex AI SearchPython API reference documentation.

To authenticate to Vertex AI Search, set up Application Default Credentials. For more information, seeSet up authentication for a local development environment.

fromgoogle.cloudimportdiscoveryengine_v1asdiscoveryengine# TODO(developer): Uncomment these variables before running the sample.# project_id = "YOUR_PROJECT_ID"client=discoveryengine.RankServiceClient()# The full resource name of the ranking config.# Format: projects/{project_id}/locations/{location}/rankingConfigs/default_ranking_configranking_config=client.ranking_config_path(project=project_id,location="global",ranking_config="default_ranking_config",)request=discoveryengine.RankRequest(ranking_config=ranking_config,model="semantic-ranker-default@latest",top_n=10,query="What is Google Gemini?",records=[discoveryengine.RankingRecord(id="1",title="Gemini",content="The Gemini zodiac symbol often depicts two figures standing side-by-side.",),discoveryengine.RankingRecord(id="2",title="Gemini",content="Gemini is a cutting edge large language model created by Google.",),discoveryengine.RankingRecord(id="3",title="Gemini Constellation",content="Gemini is a constellation that can be seen in the night sky.",),],)response=client.rank(request=request)# Handle the responseprint(response)

Supported models

The following models are available.

Model nameLatest model (semantic-ranker-default@latest)InputContext windowRelease dateDiscontinuation date
semantic-ranker-default-004YesText (25 languages)1024April 9, 2025To be determined
semantic-ranker-fast-004NoText (25 languages)1024April 9, 2025To be determined
semantic-ranker-default-003NoText (25 languages)512September 10, 2024To be determined
semantic-ranker-default-002NoText (en only)512June 3, 2024To be determined

What's next

Learn how to use the ranking method with other RAG APIs togenerate grounded answers from unstructured data.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.