Perform efficient top-k retrieval

Note: This feature is available with the Spanner Enterprise edition and Enterprise Plus edition. For more information, see theSpanner editions overview.

Many applications query a database to populate a single page in theirapplications. In such applications, the application doesn't need all of thematches, but only the top-k matches based on index sort order. Search indexescan implement this type of search very efficiently. This page describes how tocreate and search an index that has top-k matching.

Create search indexes for top-k matches

To configure a search index for top-k matching, useORDER BY to order thesearch index by a specific column. Queries need to have anORDER BY clausethat exactly matches the search index sort order (including ascending versusdescending direction) and aLIMIT clause that requests the query to stop afterfinding k-matching rows.

You can also implement pagination using these clauses. For more information, seePaginate search queries.

For some use cases, it might make sense to maintain multiple search indexessorted by different columns. Likepartitioning,it's a trade-off between storage and write cost versus query latency.

For example, consider a table that uses the following schema:

GoogleSQL

CREATETABLEAlbums(AlbumIdSTRING(MAX)NOTNULL,RecordTimestampINT64NOTNULL,ReleaseTimestampINT64NOTNULL,ListenTimestampINT64NOTNULL,AlbumTitleSTRING(MAX),AlbumTitle_TokensTOKENLISTAS(TOKENIZE_FULLTEXT(AlbumTitle))HIDDEN)PRIMARYKEY(AlbumId);CREATESEARCHINDEXAlbumsRecordTimestampIndexONAlbums(AlbumTitle_Tokens,SingerId_Tokens)STORING(ListenTimestamp)ORDERBYRecordTimestampDESCCREATESEARCHINDEXAlbumsReleaseTimestampIndexONAlbums(AlbumTitle_Tokens)STORING(ListenTimestamp)ORDERBYReleaseTimestampDESC

PostgreSQL

CREATETABLEalbums(albumidcharactervaryingNOTNULL,recordtimestampbigintNOTNULL,releasetimestampbigintNOTNULL,listentimestampbigintNOTNULL,albumtitlecharactervarying,albumtitle_tokensspanner.tokenlistGENERATEDALWAYSAS(spanner.tokenize_fulltext(albumtitle))VIRTUALHIDDEN,PRIMARYKEY(albumid));CREATESEARCHINDEXalbumsrecordtimestampindexONAlbums(albumtitle_tokens,singerid_tokens)INCLUDE(listentimestamp)ORDERBYrecordtimestampDESCCREATESEARCHINDEXalbumsreleasetimestampindexONAlbums(albumtitle_tokens)INCLUDE(listentimestamp)ORDERBYreleasetimestampDESC

Query search indexes for top-k matches

As stated previously, queries need to have anORDER BY clausethat exactly matches the search index sort order (including ascending versusdescending direction) and aLIMIT clause that requests the query to stop afterfinding k-matching rows.

The following list analyzes the efficiency of some common queries.

  • This query is very efficient. It selects theAlbumsRecordTimestampIndexindex. Even if there are many albums with the word "happy", the query onlyscans a small number of rows:

    GoogleSQL

    SELECTAlbumIdFROMAlbumsWHERESEARCH(AlbumTitle_Tokens,'happy')ORDERBYRecordTimestampDESCLIMIT10

    PostgreSQL

    SELECTalbumidFROMalbumsWHEREspanner.search(albumtitle_tokens,'happy')ORDERBYrecordtimestampDESCLIMIT10
  • The same query, requesting sort order byReleaseTimestamp in descendingorder, uses theAlbumsReleaseTimestampIndex index and is equallyefficient:

    GoogleSQL

    SELECTAlbumIdFROMAlbumsWHERESEARCH(AlbumTitle_Tokens,'happy')ORDERBYReleaseTimestampDESCLIMIT10

    PostgreSQL

    SELECTalbumidFROMalbumsWHEREspanner.search(albumtitle_tokens,'happy')ORDERBYreleasetimestampDESCLIMIT10
  • A query that requests sort order byListenTimestamp doesn't execute atop-k query efficiently. It has to fetch all matching albums, sort them byListenTimestamp, and return the top 10. Such a query uses more resourcesif there's a large number of documents that contain the term "happy".

    GoogleSQL

    SELECTAlbumIdFROMAlbumsWHERESEARCH(AlbumTitle_Tokens,'happy')ORDERBYListenTimestampDESCLIMIT10

    PostgreSQL

    SELECTalbumidFROMalbumsWHEREspanner.search(albumtitle_tokens,'happy')ORDERBYlistentimestampDESCLIMIT10
  • Similarly, a query doesn't run efficiently if it requests that results areordered using theRecordTimestamp column in ascending order. It scans allrows with the word "happy", despite having aLIMIT.

    GoogleSQL

    SELECTAlbumIdFROMAlbumsWHERESEARCH(AlbumTitle_Tokens,'happy')ORDERBYRecordTimestampASCLIMIT10

    PostgreSQL

    SELECTalbumidFROMalbumsWHEREspanner.search(albumtitle_tokens,'happy')ORDERBYrecordtimestampASCLIMIT10

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.