Customize search results ranking

Because search needs can differ for different industries and can vary fromtime to time, the default ranking behavior might not be optimal for every businessneed. To address this, you can modify the ranking behavior using custom ranking.

This page describes how to use a custom ranking formula in your search requestand how to tune the formula. This feature is available for structured,unstructured, and website data.

Overview

Custom ranking lets you provide a mathematical expression that relies on a setof model-computed signals, such as semantic relevance score and keywordsimilarity score; and document-based signals, such as a custom field likedistance or document age.

With custom ranking, you can achieve the following:

  • Gain visibility: Understand which signals contribute to the finalranking of your search results.
  • Tune existing signals: Adjust the weights of various signals likesemantic similarity, keyword matching, or document freshness.
  • Incorporate business logic: Add your own custom signals from yourdocument data directly into the ranking formula.
  • Optimize systematically: Use the open-source Python library toprogrammatically discover the optimal ranking formula.

Need for custom ranking—an example

Consider a scenario where the following string is queried on a hotelbooking website:

luxury hotel with a large rooftop pool in Vancouver, pet-friendly and close to airport.

Say, the following entries are retrieved:

  • Hotel A: "Vancouver's premierluxury hotel overlooking theairport. Features a stunningrooftop pool.No pets allowed."
  • Hotel B: "Modern,stylish hotel indowntownVancouver.Pet-friendly with spacious rooms. Features a largeindoorpool and fitness center."
  • Hotel C: "A charmingpet-friendly boutiquehotel near Aquarium (a10-minute walk from downtown). Features a lovelygarden courtyard.No pool."
  • Hotel D: "An iconicrustic resort. Known for its exquisite dining andimpeccable service. Features anindoor pool and spa.Pet-friendly options available on request."

All hotels in the catalog include a fielddistance_from_airport in kilometers(km).

Embedding-based ranking

The search system converts the query into a single embedding. It thencompares this query embedding to the embeddings of all hotels in its catalog.Hotels with embeddings that are numerically closest to the query's embedding areranked higher.

Here's the likely ranking from a purely embedding-based relevance search:

RankingHotelPossible reason for this ranking
1Hotel AVery strong semantic match for luxury, airport, rooftop pool. The "no pets" isn't desirable, but the other strong matches dominate.
2Hotel BGood semantic match for "pet-friendly" and "pool". But "indoor" instead of "rooftop", "modern" and "stylish" instead of "luxury" and "downtown" instead of "airport" make it less relevant than A.
3Hotel DStrong semantic match for pet-friendly, large pool, but "indoor" instead of "rooftop" and "rustic" instead of "luxury" make it slightly less semantically relevant than A and D.
4Hotel CStrong pet-friendly, but "no pool" and "boutique" significantly reduce its relevance to this specific query.

This ranking doesn't deliver the most relevant results. Hotel A is ranked at thetop, even though with "no pets allowed" it might not be preferred by many users.Hotel D, fits many criteria, is ranked lower because its "rustic" statusdoesn't necessarily map to "luxury" and the "indoor" pool is ranked lower thanexact matches of "large" and "outdoor".

Custom ranking

Say, you have configured the following ranking expression for this examplescenario. For information about the components of this expression, seeAbout implementing custom ranking.

rankingExpression = rr(semantic_similarity_score, 32) * 0.4 + rr(keyword_similarity_score, 32) * 0.3 + rr(c.distance_from_airport * -1, 32) * 0.8

Wheredistance_from_airport is a retrievable field in the catalog andc.distance_from_airport acts as a signal.

In custom ranking, you consider different signals that influence the relevanceof a document. Then, you create a mathematical expression containing thesesignals using avalid syntax. In this expression, younormalize the signals and add weights to their derived scores. The final customscore is calculated and the documents are ranked.

In this example, this process can be explained as follows:

  1. Each hotel is awarded a semantic similarity score and a keyword similarityscore. Additionally, distance from the airport is an important signalderived from the document.

  2. The reciprocal rank transformation function orrr() is used to transformall the scores to the same scale.

  3. The score derived from each signal is given a weight and then the sum ofall the individual scores becomes the custom-ranking score for each hotel.

The different signals for each hotel are tabulated as follows:

Hotelsemantic_similarity_scorekeyword_similarity_scorec.distance_from_airportCustom ranking scoreCustom rankingEmbedding-based ranking
Hotel A9.06.2 ("airport", "luxury", "rooftop pool")5.00.0487921
Hotel B7.55.6 ("pet-friendly", "downtown", "indoor pool", "stylish")12.50.0469132
Hotel C5.03.4 ("pet-friendly", "downtown")180.0452544
Hotel D8.04.5 ("indoor pool", "pet-friendly", "rustic")10.0489013

Comparing the two ranking methods, custom ranking gives a more consideredranking that likely matches a user's needs better than a purely embedding-basedranking.

About implementing custom ranking

To get custom ranking in your search results, you must call thesearchmethod by providing the following fields:

  • Ranking expression backend (rankingExpressionBackend): This fieldindicates which of the following ranking mechanisms is to be used.

    • RANK_BY_EMBEDDING: This is the default value when this field isunspecified. Choosing this ranks the results according to a predefinedranking expression that's either embedding-based or relevance-based.
    • RANK_BY_FORMULA: This overrides the default ranking and lets you provideyour custom formula in therankingExpression field.
  • Ranking expression (rankingExpression): This field contains amathematical formula that decides the ranking of the retrieved documents.

    • ForRANK_BY_EMBEDDING, this is either relevance-score based(double * relevanceScore) or embedding-based(double * dotProduct(embedding_field_path)).

    • ForRANK_BY_FORMULA, this is a curated expression that combinesmultiple signals to compute a new score for each search result.

Standard signals

Vertex AI Search offers a variety of signals that you can useto formulate custom ranking. Here are the standard signals available:

Signal nameDescription
default_rankThe default rank of the document as determined by the standard VAIS ranking algorithm
semantic_similarity_scoreA score computed based on query and content embeddings to determine how similar is a search query to a document's content. This is computed using a proprietary Google algorithm.
relevance_scoreA score produced by a deep-relevance model, which handles complex query-document interactions. The model determines the meaning and intention of a query in the context of the content. This is computed using a proprietary Google algorithm.
keyword_similarity_scoreA score with a strong emphasis on keyword matching. This signal uses the Best Match 25 (BM25) ranking function.
document_ageThe age of the document in hours. Supports floating point values. For example, a value of 0.5 means 30 minutes while 50 means 2 days and 2 hours.
pctr_rankA rank to denote predicted conversion rates, computed based on user event data. This signal uses predicted Click-through rate (pCTR) to gauge the relevance of a search result from a user's perspective.
topicality_rankA rank to denote keyword similarity adjustment computed using a proprietary Google algorithm.
boosting_factorA combination of all custom boosts you have applied to the document.

Custom signals

In addition to the standard signals, you can use signals from any numeric customfield in a document that is marked as retrievable. To do so, addc. prefix totheir field names. For example, if you have a custom field nameddate_approved, then you can usec.date_approved as a custom signal.

Signal names are a combination of alphabetical characters and underscores (_).The following is a list of reserved names that can't be used as signal names:log,exp,rr,is_nan, andfill_nan.

Geodistance—a derived signal

Private Preview

This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.

Note: To access this feature, you must be on the allowlist. For moreinformation, contact your Google account manager.

Derived signals, such as geodistance, are computed based on standard and customsignals. Geodistance is a function that computes the distance between a sourceand a destination location. Thegeo_distance() function is expressed asgeo_distance(source_location, destination_location). It is composed of thefollowing arguments:

  • The source location orsource_location: The origin for calculatingdistance, which can be one of the following types:

    • Query location: the location that's parsed from the query usingnatural language understanding models. For example, in the queryHotels along the M6, the natural language understanding model extractsHotels as thewhat andM6 as thewhere part of the searchparameters. Thewhere part is the query location and can be representedas a point, polyline, circle, or polygon.

      {"query":"Hotels along M6","ranking_expression":"geo_distance(query_loc, c.hotel_location)","ranking_expression_backend":"RANK_BY_FORMULA"}
    • Request location: a location that's explicitly provided in the searchrequest, such as a user's latitude and longitude. For example, you canprovide the query asHotels and provide a location using the latitudeand longitude.

      {"query":"Hotels","user_location":{"point":{"lat":52.23034637633789,"lon":20.98339855121653,}},"ranking_expression":"geo_distance(request_loc, c.hotel_location)","ranking_expression_backend":"RANK_BY_FORMULA"}
  • The destination location ordestination_location: The destination forcalculating distance, which is a custom retrievablefield such asc.office_location orc.home_location.

The order of these arguments within the function must remain the same. That is,the source location must always be the first argument within thegeo_distance() function followed by the destination location.The function computes the distance, in meters, using the latitude and longitudeof the source and destination locations.

Ranking formula syntax

The custom ranking formula is a mathematical expression with the followingcomponents:

  • Numbers (double): A positive or negative floating-point values thatadds a weight to a signal or an expression.

  • Signals (signal): The names of the signals listed in theStandard signals section.

  • Arithmetic operators:+ (addition) and* (multiplication).

  • Mathematical functions:

    • log(expression): The natural logarithm
    • exp(expression): The natural exponent

    Each of these expressions accepts exactly one argument, which is an expressionwritten in terms of a signal.

    Examples of a valid function:exp(c.document_age) andlog(keywordSimilarityScore * 0.2 + 1.0).

  • Reciprocal rank transformation function (rr):This function is expressed asrr(expression, k). It first sorts documentsby the value of theexpression in descending order and assigns thedocuments a rank. It then calculates the final value using the expressions1 / (rank_i + k); where,rank_i is the document's position in the sortedlist starting from 0 andk is a positive floating-point number you provide.

    Therr() function transforms all scores to the same scale and eliminates theneed for additional normalization.

  • Not a number (NaN) handling functions:

    • is_nan(expression): When the expression evaluates to being NaN, such aswhen a signal is missing for a document,1 is returned. Otherwise,0is returned.
    • fill_nan(arg_expression, fill_with_expression): Ifarg_expressionevaluates to being a NaN, returnsfill_with_expression. Otherwise, returnsarg_expression. This is crucial for handling documents that might bemissing certain signals.

Ranking formula examples

Here are a few examples of ranking formula that you can use in therankingExpression field of your search request:

  • An elementary linear combination:

    semantic_similarity_score * 0.7 + keyword_similarity_score * 0.3
  • A complex formula using reciprocal rank and NaN handling:

    rr(fill_nan(semantic_similarity_score, 0), 40) * 0.5 + topicality_rank * 0.5
  • A complex formula using reciprocal rank, exponential function, and NaNhandling:

    rr(fill_nan(semantic_similarity_score, 0), 40) * 0.2 + exp(keyword_similarity_score) * 0.3 + is_nan(keyword_similarity_score) * 0.1
  • A complex formula using reciprocal rank with thegeo_distance() function(Private preview):

    rr(keyword_similarity_score, 16) * 0.8 + rr(geo_distance(query_loc, c.office_location) * -1, 16) * 0.2

    In this formula, the multiplication factor is a negative value so that largerdistance correspond to higher expression value and, therefore, the reciprocalrank assigns lower ranks to greater distances.

Signals in the response

When a document is returned in the search response, the search result lists thestandard andcustom signals thatcontribute to retrieving the document from thedata store. TherankSignalsfield lists these signals.

Text fields for keyword similarity

In structured data stores, to obtain thekeywordSimilarityScore signal inyour search response, you mustupdate your schema to do the following:

  • Map the text fields essential for keyword matching to the key propertiestitle anddescription
  • Update the annotation for the text fields asSearchable

Customize ranking using ranking formula in search

To customize the ranking for your documents in your search results, manuallydraft a formula and add it to yoursearchAPI call.

  1. Formulate a ranking expression.

  2. Get search results.

    curl-XPOST-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search:search"\-d'{"servingConfig": "projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search","query": "QUERY","rankingExpression": "RANKING_EXPRESSION","rankingExpressionBackend": "RANK_BY_FORMULA"}'

    Replace the following:

Tune ranking formula using the Python library

For more advanced use cases, finding the optimal weights for your formula can bechallenging. To overcome this, you can use Vertex AI Search'sranking tuning Python library, which is an open-source tool, and arrive at asuitable formula for your use case.

The general workflow is as follows:

  1. Prepare a dataset of queries with corresponding golden labels. These goldenlabels can be unique identifying fields, such as the document ID, that canhelp you associate theSearchResult object in the search response.
  2. For a set of representative queries, call thesearchAPI to get the available ranking signals for all returned documents. You canfind this in theSearchResult.rankSignals field. Store this data alongwith your golden labels.
  3. Use the Python library to train a ranking model on this dataset.For more information, seeClearbox Python library.

  4. Convert the formula from the training results into a ranking expression,which you can then use in your API calls.

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2026-02-19 UTC.