Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open In ColabOpen on GitHub

Couchbase

Couchbase is an award-winning distributed NoSQL cloud database that delivers unmatched versatility, performance, scalability, and financial value for all of your cloud, mobile, AI, and edge computing applications. Couchbase embraces AI with coding assistance for developers and vector search for their applications.

Vector Search is a part of theFull Text Search Service (Search Service) in Couchbase.

This tutorial explains how to use Vector Search in Couchbase. You can work with eitherCouchbase Capella and your self-managed Couchbase Server.

Setup

To access theCouchbaseSearchVectorStore you first need to install thelangchain-couchbase partner package:

pip install-qU langchain-couchbase

Credentials

Head over to the Couchbasewebsite and create a new connection, making sure to save your database username and password:

import getpass

COUCHBASE_CONNECTION_STRING= getpass.getpass(
"Enter the connection string for the Couchbase cluster: "
)
DB_USERNAME= getpass.getpass("Enter the username for the Couchbase cluster: ")
DB_PASSWORD= getpass.getpass("Enter the password for the Couchbase cluster: ")
Enter the connection string for the Couchbase cluster:  ········
Enter the username for the Couchbase cluster: ········
Enter the password for the Couchbase cluster: ········

If you want to get best in-class automated tracing of your model calls you can also set yourLangSmith API key by uncommenting below:

# os.environ["LANGSMITH_TRACING"] = "true"
# os.environ["LANGSMITH_API_KEY"] = getpass.getpass()

Initialization

Before instantiating we need to create a connection.

Create Couchbase Connection Object

We create a connection to the Couchbase cluster initially and then pass the cluster object to the Vector Store.

Here, we are connecting using the username and password from above. You can also connect using any other supported way to your cluster.

For more information on connecting to the Couchbase cluster, please check thedocumentation.

from datetimeimport timedelta

from couchbase.authimport PasswordAuthenticator
from couchbase.clusterimport Cluster
from couchbase.optionsimport ClusterOptions

auth= PasswordAuthenticator(DB_USERNAME, DB_PASSWORD)
options= ClusterOptions(auth)
cluster= Cluster(COUCHBASE_CONNECTION_STRING, options)

# Wait until the cluster is ready for use.
cluster.wait_until_ready(timedelta(seconds=5))

We will now set the bucket, scope, and collection names in the Couchbase cluster that we want to use for Vector Search.

For this example, we are using the default scope & collections.

BUCKET_NAME="langchain_bucket"
SCOPE_NAME="_default"
COLLECTION_NAME="_default"
SEARCH_INDEX_NAME="langchain-test-index"

For details on how to create a Search index with support for Vector fields, please refer to the documentation.

Simple Instantiation

Below, we create the vector store object with the cluster information and the search index name.

pip install -qU langchain-openai
import getpass
import os

ifnot os.environ.get("OPENAI_API_KEY"):
os.environ["OPENAI_API_KEY"]= getpass.getpass("Enter API key for OpenAI: ")

from langchain_openaiimport OpenAIEmbeddings

embeddings= OpenAIEmbeddings(model="text-embedding-3-large")
from langchain_couchbase.vectorstoresimport CouchbaseSearchVectorStore

vector_store= CouchbaseSearchVectorStore(
cluster=cluster,
bucket_name=BUCKET_NAME,
scope_name=SCOPE_NAME,
collection_name=COLLECTION_NAME,
embedding=embeddings,
index_name=SEARCH_INDEX_NAME,
)

Specify the Text & Embeddings Field

You can optionally specify the text & embeddings field for the document using thetext_key andembedding_key fields.

vector_store_specific= CouchbaseSearchVectorStore(
cluster=cluster,
bucket_name=BUCKET_NAME,
scope_name=SCOPE_NAME,
collection_name=COLLECTION_NAME,
embedding=embeddings,
index_name=SEARCH_INDEX_NAME,
text_key="text",
embedding_key="embedding",
)

Manage vector store

Once you have created your vector store, we can interact with it by adding and deleting different items.

Add items to vector store

We can add items to our vector store by using theadd_documents function.

from uuidimport uuid4

from langchain_core.documentsimport Document

document_1= Document(
page_content="I had chocolate chip pancakes and scrambled eggs for breakfast this morning.",
metadata={"source":"tweet"},
)

document_2= Document(
page_content="The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees.",
metadata={"source":"news"},
)

document_3= Document(
page_content="Building an exciting new project with LangChain - come check it out!",
metadata={"source":"tweet"},
)

document_4= Document(
page_content="Robbers broke into the city bank and stole $1 million in cash.",
metadata={"source":"news"},
)

document_5= Document(
page_content="Wow! That was an amazing movie. I can't wait to see it again.",
metadata={"source":"tweet"},
)

document_6= Document(
page_content="Is the new iPhone worth the price? Read this review to find out.",
metadata={"source":"website"},
)

document_7= Document(
page_content="The top 10 soccer players in the world right now.",
metadata={"source":"website"},
)

document_8= Document(
page_content="LangGraph is the best framework for building stateful, agentic applications!",
metadata={"source":"tweet"},
)

document_9= Document(
page_content="The stock market is down 500 points today due to fears of a recession.",
metadata={"source":"news"},
)

document_10= Document(
page_content="I have a bad feeling I am going to get deleted :(",
metadata={"source":"tweet"},
)

documents=[
document_1,
document_2,
document_3,
document_4,
document_5,
document_6,
document_7,
document_8,
document_9,
document_10,
]
uuids=[str(uuid4())for _inrange(len(documents))]

vector_store.add_documents(documents=documents, ids=uuids)
API Reference:Document
['f125b836-f555-4449-98dc-cbda4e77ae3f',
'a28fccde-fd32-4775-9ca8-6cdb22ca7031',
'b1037c4b-947f-497f-84db-63a4def5080b',
'c7082b74-b385-4c4b-bbe5-0740909c01db',
'a7e31f62-13a5-4109-b881-8631aff7d46c',
'9fcc2894-fdb1-41bd-9a93-8547747650f4',
'a5b0632d-abaf-4802-99b3-df6b6c99be29',
'0475592e-4b7f-425d-91fd-ac2459d48a36',
'94c6db4e-ba07-43ff-aa96-3a5d577db43a',
'd21c7feb-ad47-4e7d-84c5-785afb189160']

Delete items from vector store

vector_store.delete(ids=[uuids[-1]])
True

Query vector store

Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.

Query directly

Similarity search

Performing a simple similarity search can be done as follows:

results= vector_store.similarity_search(
"LangChain provides abstractions to make working with LLMs easy",
k=2,
)
for resin results:
print(f"*{res.page_content} [{res.metadata}]")
* Building an exciting new project with LangChain - come check it out! [{'source': 'tweet'}]
* LangGraph is the best framework for building stateful, agentic applications! [{'source': 'tweet'}]

Similarity search with Score

You can also fetch the scores for the results by calling thesimilarity_search_with_score method.

results= vector_store.similarity_search_with_score("Will it be hot tomorrow?", k=1)
for res, scorein results:
print(f"* [SIM={score:3f}]{res.page_content} [{res.metadata}]")
* [SIM=0.553112] The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}]

Filtering Results

You can filter the search results by specifying any filter on the text or metadata in the document that is supported by the Couchbase Search service.

Thefilter can be any validSearchQuery supported by the Couchbase Python SDK. These filters are applied before the Vector Search is performed.

If you want to filter on one of the fields in the metadata, you need to specify it using.

For example, to fetch thesource field in the metadata, you need to specifymetadata.source.

Note that the filter needs to be supported by the Search Index.

from couchbaseimport search

query="Are there any concerning financial news?"
filter_on_source= search.MatchQuery("news", field="metadata.source")
results= vector_store.similarity_search_with_score(
query, fields=["metadata.source"],filter=filter_on_source, k=5
)
for res, scorein results:
print(f"*{res.page_content} [{res.metadata}]{score}")
* The stock market is down 500 points today due to fears of a recession. [{'source': 'news'}] 0.3873019218444824
* Robbers broke into the city bank and stole $1 million in cash. [{'source': 'news'}] 0.20637212693691254
* The weather forecast for tomorrow is cloudy and overcast, with a high of 62 degrees. [{'source': 'news'}] 0.10404900461435318

Specifying Fields to Return

You can specify the fields to return from the document usingfields parameter in the searches. These fields are returned as part of themetadata object in the returned Document. You can fetch any field that is stored in the Search index. Thetext_key of the document is returned as part of the document'spage_content.

If you do not specify any fields to be fetched, all the fields stored in the index are returned.

If you want to fetch one of the fields in the metadata, you need to specify it using.

For example, to fetch thesource field in the metadata, you need to specifymetadata.source.

query="What did I eat for breakfast today?"
results= vector_store.similarity_search(query, fields=["metadata.source"])
print(results[0])
page_content='I had chocolate chip pancakes and scrambled eggs for breakfast this morning.' metadata={'source': 'tweet'}

Query by turning into retriever

You can also transform the vector store into a retriever for easier usage in your chains.

Here is how to transform your vector store into a retriever and then invoke the retreiever with a simple query and filter.

retriever= vector_store.as_retriever(
search_type="similarity",
search_kwargs={"k":1,"score_threshold":0.5},
)
filter_on_source= search.MatchQuery("news", field="metadata.source")
retriever.invoke("Stealing from the bank is a crime",filter=filter_on_source)
[Document(id='c7082b74-b385-4c4b-bbe5-0740909c01db', metadata={'source': 'news'}, page_content='Robbers broke into the city bank and stole $1 million in cash.')]

Hybrid Queries

Couchbase allows you to do hybrid searches by combining Vector Search results with searches on non-vector fields of the document like themetadata object.

The results will be based on the combination of the results from both Vector Search and the searches supported by Search Service. The scores of each of the component searches are added up to get the total score of the result.

To perform hybrid searches, there is an optional parameter,search_options that can be passed to all the similarity searches.
The different search/query possibilities for thesearch_options can be foundhere.

Create Diverse Metadata for Hybrid Search

In order to simulate hybrid search, let us create some random metadata from the existing documents.We uniformly add three fields to the metadata,date between 2010 & 2020,rating between 1 & 5 andauthor set to either John Doe or Jane Doe.

from langchain_community.document_loadersimport TextLoader
from langchain_text_splittersimport CharacterTextSplitter

loader= TextLoader("../../how_to/state_of_the_union.txt")
documents= loader.load()
text_splitter= CharacterTextSplitter(chunk_size=500, chunk_overlap=0)
docs= text_splitter.split_documents(documents)

# Adding metadata to documents
for i, docinenumerate(docs):
doc.metadata["date"]=f"{range(2010,2020)[i%10]}-01-01"
doc.metadata["rating"]=range(1,6)[i%5]
doc.metadata["author"]=["John Doe","Jane Doe"][i%2]

vector_store.add_documents(docs)

query="What did the president say about Ketanji Brown Jackson"
results= vector_store.similarity_search(query)
print(results[0].metadata)
{'author': 'John Doe', 'date': '2016-01-01', 'rating': 2, 'source': '../../how_to/state_of_the_union.txt'}

Query by Exact Value

We can search for exact matches on a textual field like the author in themetadata object.

query="What did the president say about Ketanji Brown Jackson"
results= vector_store.similarity_search(
query,
search_options={"query":{"field":"metadata.author","match":"John Doe"}},
fields=["metadata.author"],
)
print(results[0])
page_content='One of the most serious constitutional responsibilities a President has is nominating someone to serve on the United States Supreme Court.

And I did that 4 days ago, when I nominated Circuit Court of Appeals Judge Ketanji Brown Jackson. One of our nation’s top legal minds, who will continue Justice Breyer’s legacy of excellence.' metadata={'author': 'John Doe'}

Query by Partial Match

We can search for partial matches by specifying a fuzziness for the search. This is useful when you want to search for slight variations or misspellings of a search query.

Here, "Jae" is close (fuzziness of 1) to "Jane".

query="What did the president say about Ketanji Brown Jackson"
results= vector_store.similarity_search(
query,
search_options={
"query":{"field":"metadata.author","match":"Jae","fuzziness":1}
},
fields=["metadata.author"],
)
print(results[0])
page_content='A former top litigator in private practice. A former federal public defender. And from a family of public school educators and police officers. A consensus builder. Since she’s been nominated, she’s received a broad range of support—from the Fraternal Order of Police to former judges appointed by Democrats and Republicans.

And if we are to advance liberty and justice, we need to secure the Border and fix the immigration system.' metadata={'author': 'Jane Doe'}

Query by Date Range Query

We can search for documents that are within a date range query on a date field likemetadata.date.

query="Any mention about independence?"
results= vector_store.similarity_search(
query,
search_options={
"query":{
"start":"2016-12-31",
"end":"2017-01-02",
"inclusive_start":True,
"inclusive_end":False,
"field":"metadata.date",
}
},
)
print(results[0])
page_content='And with 75% of adult Americans fully vaccinated and hospitalizations down by 77%, most Americans can remove their masks, return to work, stay in the classroom, and move forward safely.

We achieved this because we provided free vaccines, treatments, tests, and masks.

Of course, continuing this costs money.

I will soon send Congress a request.

The vast majority of Americans have used these tools and may want to again, so I expect Congress to pass it quickly.' metadata={'author': 'Jane Doe', 'date': '2017-01-01', 'rating': 3, 'source': '../../how_to/state_of_the_union.txt'}

Query by Numeric Range Query

We can search for documents that are within a range for a numeric field likemetadata.rating.

query="Any mention about independence?"
results= vector_store.similarity_search_with_score(
query,
search_options={
"query":{
"min":3,
"max":5,
"inclusive_min":True,
"inclusive_max":True,
"field":"metadata.rating",
}
},
)
print(results[0])
(Document(id='3a90405c0f5b4c09a6646259678f1f61', metadata={'author': 'John Doe', 'date': '2014-01-01', 'rating': 5, 'source': '../../how_to/state_of_the_union.txt'}, page_content='In this Capitol, generation after generation, Americans have debated great questions amid great strife, and have done great things. \n\nWe have fought for freedom, expanded liberty, defeated totalitarianism and terror. \n\nAnd built the strongest, freest, and most prosperous nation the world has ever known. \n\nNow is the hour. \n\nOur moment of responsibility. \n\nOur test of resolve and conscience, of history itself.'), 0.3573387440020518)

Combining Multiple Search Queries

Different search queries can be combined using AND (conjuncts) or OR (disjuncts) operators.

In this example, we are checking for documents with a rating between 3 & 4 and dated between 2015 & 2018.

query="Any mention about independence?"
results= vector_store.similarity_search_with_score(
query,
search_options={
"query":{
"conjuncts":[
{"min":3,"max":4,"inclusive_max":True,"field":"metadata.rating"},
{"start":"2016-12-31","end":"2017-01-02","field":"metadata.date"},
]
}
},
)
print(results[0])
(Document(id='7115a704877a46ad94d661dd9c81cbc3', metadata={'author': 'Jane Doe', 'date': '2017-01-01', 'rating': 3, 'source': '../../how_to/state_of_the_union.txt'}, page_content='And with 75% of adult Americans fully vaccinated and hospitalizations down by 77%, most Americans can remove their masks, return to work, stay in the classroom, and move forward safely. \n\nWe achieved this because we provided free vaccines, treatments, tests, and masks. \n\nOf course, continuing this costs money. \n\nI will soon send Congress a request. \n\nThe vast majority of Americans have used these tools and may want to again, so I expect Congress to pass it quickly.'), 0.6898253780130769)

Note

The hybrid search results might contain documents that do not satisfy all the search parameters. This is due to the way thescoring is calculated.The score is a sum of both the vector search score and the queries in the hybrid search. If the Vector Search score is high, the combined score will be more than the results that match all the queries in the hybrid search.To avoid such results, please use thefilter parameter instead of hybrid search.

Combining Hybrid Search Query with Filters

Hybrid Search can be combined with filters to get the best of both hybrid search and the filters for results matching the requirements.

In this example, we are checking for documents with a rating between 3 & 5 and matching the string "independence" in the text field.

filter_text= search.MatchQuery("independence", field="text")

query="Any mention about independence?"
results= vector_store.similarity_search_with_score(
query,
search_options={
"query":{
"min":3,
"max":5,
"inclusive_min":True,
"inclusive_max":True,
"field":"metadata.rating",
}
},
filter=filter_text,
)

print(results[0])
(Document(id='23bb51b4e4d54a94ab0a95e72be8428c', metadata={'author': 'John Doe', 'date': '2012-01-01', 'rating': 3, 'source': '../../how_to/state_of_the_union.txt'}, page_content='And we remain clear-eyed. The Ukrainians are fighting back with pure courage. But the next few days weeks, months, will be hard on them.  \n\nPutin has unleashed violence and chaos.  But while he may make gains on the battlefield – he will pay a continuing high price over the long run. \n\nAnd a proud Ukrainian people, who have known 30 years  of independence, have repeatedly shown that they will not tolerate anyone who tries to take their country backwards.'), 0.30549919644400614)

Other Queries

Similarly, you can use any of the supported Query methods like Geo Distance, Polygon Search, Wildcard, Regular Expressions, etc in thesearch_options parameter. Please refer to the documentation for more details on the available query methods and their syntax.

Usage for retrieval-augmented generation

For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:

Frequently Asked Questions

Question: Should I create the Search index before creating the CouchbaseSearchVectorStore object?

Yes, currently you need to create the Search index before creating theCouchbaseSearchVectoreStore object.

Question: I am not seeing all the fields that I specified in my search results.

In Couchbase, we can only return the fields stored in the Search index. Please ensure that the field that you are trying to access in the search results is part of the Search index.

One way to handle this is to index and store a document's fields dynamically in the index.

  • In Capella, you need to go to "Advanced Mode" then under the chevron "General Settings" you can check "[X] Store Dynamic Fields" or "[X] Index Dynamic Fields"
  • In Couchbase Server, in the Index Editor (not Quick Editor) under the chevron "Advanced" you can check "[X] Store Dynamic Fields" or "[X] Index Dynamic Fields"

Note that these options will increase the size of the index.

For more details on dynamic mappings, please refer to thedocumentation.

Question: I am unable to see the metadata object in my search results.

This is most likely due to themetadata field in the document not being indexed and/or stored by the Couchbase Search index. In order to index themetadata field in the document, you need to add it to the index as a child mapping.

If you select to map all the fields in the mapping, you will be able to search by all metadata fields. Alternatively, to optimize the index, you can select the specific fields insidemetadata object to be indexed. You can refer to thedocs to learn more about indexing child mappings.

Creating Child Mappings

Question: What is the difference between filter and search_options / hybrid queries?

Filters arepre-filters that are used to restrict the documents searched in a Search index. It is available in Couchbase Server 7.6.4 & higher.

Hybrid Queries are additional search queries that can be used to tune the results being returned from the search index.

Both filters and hybrid search queries have the same capabilites with slightly different syntax. Filters areSearchQuery objects while the hybrid search queries aredictionaries.

API reference

For detailed documentation of allCouchbaseSearchVectorStore features and configurations head to theAPI reference

Related


[8]ページ先頭

©2009-2025 Movatter.jp