Perform Maximal Marginal Relevance search with LangChain on Bigtable

This page describes how to perform a Maximal Marginal Relevance (MMR) searchusing theBigtableVectorStore integration for LangChain inBigtable and Vertex AI as the embedding service.

MMR is a search technique used in information retrieval to return a set ofresults that are both relevant to the query and diverse, avoiding redundancy.While standard vector similarity search (for example, a search that uses thek-nearest neighbor method) might return many similar items,MMR provides a more varied set of top results. This is usefulwhen you have potentially overlapping or duplicative data in your vector store.

For example, in an ecommerce application, if a user searches for "redtomatoes", then a vector similarity search might return multiple listings of the sametype of fresh red tomato. An MMR search would aim to return a morediverse set, such as "fresh red tomatoes", "canned diced red tomatoes", "organiccherry tomatoes", and perhaps even "tomato salad recipe".

Before you read this page, it's important that you know the following concepts:

Relevance: A measure of how closely the document matches the query.
Diversity: A measure of how different a document is from the documentsalready selected in the result set.
Lambda Multiplier: A factor between 0 and 1 thatbalances relevance and diversity. A value closer to 1 prioritizes relevance,while a value closer to 0 prioritizes diversity.

TheBigtableVectorStore class for LangChain implements MMR as a re-ranking algorithm.This algorithm first fetches a larger set of documents relevant to the query,and selects the documents that match the query in a way that's balanced forrelevance and diversity.

Before you begin

This guide uses Vertex AI as the embedding service. Make sure you havethe Vertex AI API enabled in your project.

Enable the Vertex AI API

Required roles

To use Bigtable with LangChain, you need the followingIAM roles:

Bigtable User(roles/bigtable.user) onthe Bigtable instance.
If you're initializing the table, you also need the BigtableAdministrator(roles/bigtable.admin) role.

Set up your environment

Install the required LangChain packages:

pipinstall--upgrade--quietlangchain-google-bigtablelangchain-google-vertexai

Authenticate to Google Cloud with your user account.
```
gcloudauthapplication-defaultlogin
```

Set your project ID, Bigtable instance ID, and table ID:

PROJECT_ID="your-project-id"INSTANCE_ID="your-instance-id"TABLE_ID="your-table-id"

Initialize the embedding service and create a table

To use the Bigtable vector store, you need to provide embeddingsgenerated by an AI model. In this guide, you use the text embeddinggemini-embedding-004 model onVertex AI.

fromlangchain_google_vertexaiimportVertexAIEmbeddingsfromlangchain_google_bigtable.vector_storeimportinit_vector_store_table,BigtableVectorStore,ColumnConfig# Initialize an embedding serviceembedding_service=VertexAIEmbeddings(model_name="text-embedding-004",project=PROJECT_ID)# Define column familiesDATA_COLUMN_FAMILY="product_data"# Initialize the table (if it doesn't exist)try:init_vector_store_table(project_id=PROJECT_ID,instance_id=INSTANCE_ID,table_id=TABLE_ID,content_column_family=DATA_COLUMN_FAMILY,embedding_column_family=DATA_COLUMN_FAMILY,)print(f"Table{TABLE_ID} created successfully.")exceptValueErrorase:print(e)# Table likely already exists

Instantiate BigtableVectorStore

Create the store instance by passing the embedding service andBigtable table identifiers.

# Configure columnscontent_column=ColumnConfig(column_family=DATA_COLUMN_FAMILY,column_qualifier="product_description")embedding_column=ColumnConfig(column_family=DATA_COLUMN_FAMILY,column_qualifier="embedding")# Create the vector store instancevector_store=BigtableVectorStore.create_sync(project_id=PROJECT_ID,instance_id=INSTANCE_ID,table_id=TABLE_ID,embedding_service=embedding_service,collection="ecommerce_products",content_column=content_column,embedding_column=embedding_column,)print("BigtableVectorStore instantiated.")

Populate the vector store

In this guide, we use a sample scenario of a fictional ecommerce servicewhere users want to search for items related to red tomatoes. First, we need toadd some tomato-related products and descriptions to the vector store.

fromlangchain_core.documentsimportDocumentproducts=[Document(page_content="Fresh organic red tomatoes, great for salads.",metadata={"type":"fresh produce","color":"red","name":"Organic Vine Tomatoes"}),Document(page_content="Ripe red tomatoes on the vine.",metadata={"type":"fresh","color":"red","name":"Tomatoes on Vine"}),Document(page_content="Sweet cherry tomatoes, red and juicy.",metadata={"type":"fresh","color":"red","name":"Cherry Tomatoes"}),Document(page_content="Canned diced red tomatoes in juice.",metadata={"type":"canned","color":"red","name":"Diced Tomatoes"}),Document(page_content="Sun-dried tomatoes in oil.",metadata={"type":"preserved","color":"red","name":"Sun-Dried Tomatoes"}),Document(page_content="Green tomatoes, perfect for frying.",metadata={"type":"fresh","color":"green","name":"Green Tomatoes"}),Document(page_content="Tomato paste, concentrated flavor.",metadata={"type":"canned","color":"red","name":"Tomato Paste"}),Document(page_content="Mixed salad greens with cherry tomatoes.",metadata={"type":"prepared","color":"mixed","name":"Salad Mix with Tomatoes"}),Document(page_content="Yellow pear tomatoes, mild flavor.",metadata={"type":"fresh","color":"yellow","name":"Yellow Pear Tomatoes"}),Document(page_content="Heirloom tomatoes, various colors.",metadata={"type":"fresh","color":"various","name":"Heirloom Tomatoes"}),]vector_store.add_documents(products)print(f"Added{len(products)} products to the vector store.")

Perform MMR Search

The store contains many tomato-related products, but users want to searchfor offerings associated with "red tomatoes" only. To get a diverse set of results,use the MMR technique to perform the search.

The key method to use is themax_marginal_relevance_search that takes the following arguments:

query (str): The search text.
k (int): The final number of search results.
fetch_k (int): The initial number of similar products to retrieve beforeapplying the MMR algorithm. We recommend that this number is larger than thek parameter.
lambda_mult (float): The diversity tuning parameter. Use0.0 for maximumdiversity,1.0 for maximum relevance.

user_query="red tomatoes"k_results=4fetch_k_candidates=10print(f"Performing MMR search for: '{user_query}'")# Example 1: Balanced relevance and diversitymmr_results_balanced=vector_store.max_marginal_relevance_search(user_query,k=k_results,fetch_k=fetch_k_candidates,lambda_mult=0.5)print(f"MMR Results (lambda=0.5, k={k_results}, fetch_k={fetch_k_candidates}):")fordocinmmr_results_balanced:print(f"  -{doc.metadata['name']}:{doc.page_content}")print("\n")# Example 2: Prioritizing Diversitymmr_results_diverse=vector_store.max_marginal_relevance_search(user_query,k=k_results,fetch_k=fetch_k_candidates,lambda_mult=0.1)print(f"MMR Results (lambda=0.1, k={k_results}, fetch_k={fetch_k_candidates}):")fordocinmmr_results_diverse:print(f"  -{doc.metadata['name']}:{doc.page_content}")print("\n")# Example 3: Prioritizing Relevancemmr_results_relevant=vector_store.max_marginal_relevance_search(user_query,k=k_results,fetch_k=fetch_k_candidates,lambda_mult=0.9)print(f"MMR Results (lambda=0.9, k={k_results}, fetch_k={fetch_k_candidates}):")fordocinmmr_results_relevant:print(f"  -{doc.metadata['name']}:{doc.page_content}")

Differentlambda_mult values yield different sets of results, balancing thesimilarity to "red tomatoes" with the uniqueness of the products shown.

Using MMR with a retriever

To use MMR search, you can also configure aLangChain retriever.The retriever provides a uniform interface that lets youseamlessly integrate specialized search methods, such as MMR, directly into yourchains and applications.

retriever=vector_store.as_retriever(search_type="mmr",search_kwargs={"k":3,"fetch_k":10,"lambda_mult":0.3,})retrieved_docs=retriever.invoke(user_query)print(f"\nRetriever MMR Results for '{user_query}':")fordocinretrieved_docs:print(f"  -{doc.metadata['name']}:{doc.page_content}")

What's next

Except as otherwise noted, the content of this page is licensed under theCreative Commons Attribution 4.0 License, and code samples are licensed under theApache 2.0 License. For details, see theGoogle Developers Site Policies. Java is a registered trademark of Oracle and/or its affiliates.

Last updated 2025-12-15 UTC.

Movatterモバイル変換