Customize search results ranking Stay organized with collections Save and categorize content based on your preferences.
Because search needs can differ for different industries and can vary fromtime to time, the default ranking behavior might not be optimal for every businessneed. To address this, you can modify the ranking behavior using custom ranking.
This page describes how to use a custom ranking formula in your search requestand how to tune the formula. This feature is available for structured,unstructured, and website data.
Overview
Custom ranking lets you provide a mathematical expression that relies on a setof model-computed signals, such as semantic relevance score and keywordsimilarity score; and document-based signals, such as a custom field likedistance or document age.
With custom ranking, you can achieve the following:
- Gain visibility: Understand which signals contribute to the finalranking of your search results.
- Tune existing signals: Adjust the weights of various signals likesemantic similarity, keyword matching, or document freshness.
- Incorporate business logic: Add your own custom signals from yourdocument data directly into the ranking formula.
- Optimize systematically: Use the open-source Python library toprogrammatically discover the optimal ranking formula.
Need for custom ranking—an example
Consider a scenario where the following string is queried on a hotelbooking website:
luxury hotel with a large rooftop pool in Vancouver, pet-friendly and close to airport.Say, the following entries are retrieved:
- Hotel A: "Vancouver's premierluxury hotel overlooking theairport. Features a stunningrooftop pool.No pets allowed."
- Hotel B: "Modern,stylish hotel indowntownVancouver.Pet-friendly with spacious rooms. Features a largeindoorpool and fitness center."
- Hotel C: "A charmingpet-friendly boutiquehotel near Aquarium (a10-minute walk from downtown). Features a lovelygarden courtyard.No pool."
- Hotel D: "An iconicrustic resort. Known for its exquisite dining andimpeccable service. Features anindoor pool and spa.Pet-friendly options available on request."
All hotels in the catalog include a fielddistance_from_airport in kilometers(km).
Embedding-based ranking
The search system converts the query into a single embedding. It thencompares this query embedding to the embeddings of all hotels in its catalog.Hotels with embeddings that are numerically closest to the query's embedding areranked higher.
Here's the likely ranking from a purely embedding-based relevance search:
| Ranking | Hotel | Possible reason for this ranking |
|---|---|---|
| 1 | Hotel A | Very strong semantic match for luxury, airport, rooftop pool. The "no pets" isn't desirable, but the other strong matches dominate. |
| 2 | Hotel B | Good semantic match for "pet-friendly" and "pool". But "indoor" instead of "rooftop", "modern" and "stylish" instead of "luxury" and "downtown" instead of "airport" make it less relevant than A. |
| 3 | Hotel D | Strong semantic match for pet-friendly, large pool, but "indoor" instead of "rooftop" and "rustic" instead of "luxury" make it slightly less semantically relevant than A and D. |
| 4 | Hotel C | Strong pet-friendly, but "no pool" and "boutique" significantly reduce its relevance to this specific query. |
This ranking doesn't deliver the most relevant results. Hotel A is ranked at thetop, even though with "no pets allowed" it might not be preferred by many users.Hotel D, fits many criteria, is ranked lower because its "rustic" statusdoesn't necessarily map to "luxury" and the "indoor" pool is ranked lower thanexact matches of "large" and "outdoor".
Custom ranking
Say, you have configured the following ranking expression for this examplescenario. For information about the components of this expression, seeAbout implementing custom ranking.
rankingExpression = rr(semantic_similarity_score, 32) * 0.4 + rr(keyword_similarity_score, 32) * 0.3 + rr(c.distance_from_airport * -1, 32) * 0.8Wheredistance_from_airport is a retrievable field in the catalog andc.distance_from_airport acts as a signal.
In custom ranking, you consider different signals that influence the relevanceof a document. Then, you create a mathematical expression containing thesesignals using avalid syntax. In this expression, younormalize the signals and add weights to their derived scores. The final customscore is calculated and the documents are ranked.
In this example, this process can be explained as follows:
Each hotel is awarded a semantic similarity score and a keyword similarityscore. Additionally, distance from the airport is an important signalderived from the document.
The reciprocal rank transformation function or
rr()is used to transformall the scores to the same scale.The score derived from each signal is given a weight and then the sum ofall the individual scores becomes the custom-ranking score for each hotel.
The different signals for each hotel are tabulated as follows:
| Hotel | semantic_similarity_score | keyword_similarity_score | c.distance_from_airport | Custom ranking score | Custom ranking | Embedding-based ranking |
|---|---|---|---|---|---|---|
| Hotel A | 9.0 | 6.2 ("airport", "luxury", "rooftop pool") | 5.0 | 0.04879 | 2 | 1 |
| Hotel B | 7.5 | 5.6 ("pet-friendly", "downtown", "indoor pool", "stylish") | 12.5 | 0.04691 | 3 | 2 |
| Hotel C | 5.0 | 3.4 ("pet-friendly", "downtown") | 18 | 0.04525 | 4 | 4 |
| Hotel D | 8.0 | 4.5 ("indoor pool", "pet-friendly", "rustic") | 1 | 0.04890 | 1 | 3 |
Comparing the two ranking methods, custom ranking gives a more consideredranking that likely matches a user's needs better than a purely embedding-basedranking.
About implementing custom ranking
To get custom ranking in your search results, you must call thesearchmethod by providing the following fields:
Ranking expression backend (
rankingExpressionBackend): This fieldindicates which of the following ranking mechanisms is to be used.RANK_BY_EMBEDDING: This is the default value when this field isunspecified. Choosing this ranks the results according to a predefinedranking expression that's either embedding-based or relevance-based.RANK_BY_FORMULA: This overrides the default ranking and lets you provideyour custom formula in therankingExpressionfield.
Ranking expression (
rankingExpression): This field contains amathematical formula that decides the ranking of the retrieved documents.For
RANK_BY_EMBEDDING, this is either relevance-score based(double * relevanceScore) or embedding-based(double * dotProduct(embedding_field_path)).For
RANK_BY_FORMULA, this is a curated expression that combinesmultiple signals to compute a new score for each search result.
Standard signals
Vertex AI Search offers a variety of signals that you can useto formulate custom ranking. Here are the standard signals available:
| Signal name | Description |
|---|---|
default_rank | The default rank of the document as determined by the standard VAIS ranking algorithm |
semantic_similarity_score | A score computed based on query and content embeddings to determine how similar is a search query to a document's content. This is computed using a proprietary Google algorithm. |
relevance_score | A score produced by a deep-relevance model, which handles complex query-document interactions. The model determines the meaning and intention of a query in the context of the content. This is computed using a proprietary Google algorithm. |
keyword_similarity_score | A score with a strong emphasis on keyword matching. This signal uses the Best Match 25 (BM25) ranking function. |
document_age | The age of the document in hours. Supports floating point values. For example, a value of 0.5 means 30 minutes while 50 means 2 days and 2 hours. |
pctr_rank | A rank to denote predicted conversion rates, computed based on user event data. This signal uses predicted Click-through rate (pCTR) to gauge the relevance of a search result from a user's perspective. |
topicality_rank | A rank to denote keyword similarity adjustment computed using a proprietary Google algorithm. |
boosting_factor | A combination of all custom boosts you have applied to the document. |
Custom signals
In addition to the standard signals, you can use signals from any numeric customfield in a document that is marked as retrievable. To do so, addc. prefix totheir field names. For example, if you have a custom field nameddate_approved, then you can usec.date_approved as a custom signal.
Signal names are a combination of alphabetical characters and underscores (_).The following is a list of reserved names that can't be used as signal names:log,exp,rr,is_nan, andfill_nan.
Geodistance—a derived signal
Private Preview
This feature is subject to the "Pre-GA Offerings Terms" in the General Service Terms section of theService Specific Terms. Pre-GA features are available "as is" and might have limited support. For more information, see thelaunch stage descriptions.
Note: To access this feature, you must be on the allowlist. For moreinformation, contact your Google account manager.Derived signals, such as geodistance, are computed based on standard and customsignals. Geodistance is a function that computes the distance between a sourceand a destination location. Thegeo_distance() function is expressed asgeo_distance(source_location, destination_location). It is composed of thefollowing arguments:
The source location or
source_location: The origin for calculatingdistance, which can be one of the following types:Query location: the location that's parsed from the query usingnatural language understanding models. For example, in the query
Hotels along the M6, the natural language understanding model extractsHotelsas thewhat andM6as thewhere part of the searchparameters. Thewhere part is the query location and can be representedas a point, polyline, circle, or polygon.{"query":"Hotels along M6","ranking_expression":"geo_distance(query_loc, c.hotel_location)","ranking_expression_backend":"RANK_BY_FORMULA"}Request location: a location that's explicitly provided in the searchrequest, such as a user's latitude and longitude. For example, you canprovide the query as
Hotelsand provide a location using the latitudeand longitude.{"query":"Hotels","user_location":{"point":{"lat":52.23034637633789,"lon":20.98339855121653,}},"ranking_expression":"geo_distance(request_loc, c.hotel_location)","ranking_expression_backend":"RANK_BY_FORMULA"}
The destination location or
destination_location: The destination forcalculating distance, which is a custom retrievablefield such asc.office_locationorc.home_location.
The order of these arguments within the function must remain the same. That is,the source location must always be the first argument within thegeo_distance() function followed by the destination location.The function computes the distance, in meters, using the latitude and longitudeof the source and destination locations.
Ranking formula syntax
The custom ranking formula is a mathematical expression with the followingcomponents:
Numbers (
double): A positive or negative floating-point values thatadds a weight to a signal or an expression.Signals (
signal): The names of the signals listed in theStandard signals section.Arithmetic operators:
+(addition) and*(multiplication).Mathematical functions:
log(expression): The natural logarithmexp(expression): The natural exponent
Each of these expressions accepts exactly one argument, which is an expressionwritten in terms of a signal.
Examples of a valid function:
exp(c.document_age)andlog(keywordSimilarityScore * 0.2 + 1.0).Reciprocal rank transformation function (
rr):This function is expressed asrr(expression, k). It first sorts documentsby the value of theexpressionin descending order and assigns thedocuments a rank. It then calculates the final value using the expressions1 / (rank_i + k); where,rank_iis the document's position in the sortedlist starting from 0 andkis a positive floating-point number you provide.The
rr()function transforms all scores to the same scale and eliminates theneed for additional normalization.Not a number (NaN) handling functions:
is_nan(expression): When the expression evaluates to being NaN, such aswhen a signal is missing for a document,1is returned. Otherwise,0is returned.fill_nan(arg_expression, fill_with_expression): Ifarg_expressionevaluates to being a NaN, returnsfill_with_expression. Otherwise, returnsarg_expression. This is crucial for handling documents that might bemissing certain signals.
Ranking formula examples
Here are a few examples of ranking formula that you can use in therankingExpression field of your search request:
An elementary linear combination:
semantic_similarity_score * 0.7 + keyword_similarity_score * 0.3A complex formula using reciprocal rank and NaN handling:
rr(fill_nan(semantic_similarity_score, 0), 40) * 0.5 + topicality_rank * 0.5A complex formula using reciprocal rank, exponential function, and NaNhandling:
rr(fill_nan(semantic_similarity_score, 0), 40) * 0.2 + exp(keyword_similarity_score) * 0.3 + is_nan(keyword_similarity_score) * 0.1A complex formula using reciprocal rank with the
geo_distance()function(Private preview):rr(keyword_similarity_score, 16) * 0.8 + rr(geo_distance(query_loc, c.office_location) * -1, 16) * 0.2In this formula, the multiplication factor is a negative value so that largerdistance correspond to higher expression value and, therefore, the reciprocalrank assigns lower ranks to greater distances.
Signals in the response
When a document is returned in the search response, the search result lists thestandard andcustom signals thatcontribute to retrieving the document from thedata store. TherankSignalsfield lists these signals.
Text fields for keyword similarity
In structured data stores, to obtain thekeywordSimilarityScore signal inyour search response, you mustupdate your schema to do the following:
- Map the text fields essential for keyword matching to the key properties
titleanddescription - Update the annotation for the text fields as
Searchable
Customize ranking using ranking formula in search
To customize the ranking for your documents in your search results, manuallydraft a formula and add it to yoursearchAPI call.
Formulate a ranking expression.
Get search results.
curl-XPOST-H"Authorization: Bearer$(gcloudauthprint-access-token)"\-H"Content-Type: application/json"\"https://discoveryengine.googleapis.com/v1/projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search:search"\-d'{"servingConfig": "projects/PROJECT_ID/locations/global/collections/default_collection/engines/APP_ID/servingConfigs/default_search","query": "QUERY","rankingExpression": "RANKING_EXPRESSION","rankingExpressionBackend": "RANK_BY_FORMULA"}'Replace the following:
PROJECT_ID: the ID of your Google Cloud project.APP_ID: the ID of the Vertex AI Search app that youwant to query.QUERY: the query text to search.RANKING_EXPRESSION: the custom ranking formula thatyou can write using theavailable signals with a validranking formula syntax.- For valid examples, seeRanking formula examples.
- To tune the ranking formula, which can give you the best results, seeTune ranking formula using the Python library.
Tune ranking formula using the Python library
For more advanced use cases, finding the optimal weights for your formula can bechallenging. To overcome this, you can use Vertex AI Search'sranking tuning Python library, which is an open-source tool, and arrive at asuitable formula for your use case.
The general workflow is as follows:
- Prepare a dataset of queries with corresponding golden labels. These goldenlabels can be unique identifying fields, such as the document ID, that canhelp you associate the
SearchResultobject in the search response. - For a set of representative queries, call the
searchAPI to get the available ranking signals for all returned documents. You canfind this in theSearchResult.rankSignalsfield. Store this data alongwith your golden labels. Use the Python library to train a ranking model on this dataset.For more information, seeClearbox Python library.
Convert the formula from the training results into a ranking expression,which you can then use in your API calls.
Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.
Last updated 2026-02-19 UTC.
Open in Colab
View on GitHub