Movatterモバイル変換


[0]ホーム

URL:


Skip to main content
OurBuilding Ambient Agents with LangGraph course is now available on LangChain Academy!
Open In ColabOpen on GitHub

SingleStoreVectorStore

SingleStore is a robust, high-performance distributed SQL database solution designed to excel in bothcloud and on-premises environments. Boasting a versatile feature set, it offers seamless deployment options while delivering unparalleled performance.

A standout feature of SingleStore is its advanced support for vector storage and operations, making it an ideal choice for applications requiring intricate AI capabilities such as text similarity matching. With built-in vector functions likedot_product andeuclidean_distance, SingleStore empowers developers to implement sophisticated algorithms efficiently.

For developers keen on leveraging vector data within SingleStore, a comprehensive tutorial is available, guiding them through the intricacies ofworking with vector data. This tutorial delves into the Vector Store within SingleStoreDB, showcasing its ability to facilitate searches based on vector similarity. Leveraging vector indexes, queries can be executed with remarkable speed, enabling swift retrieval of relevant data.

Moreover, SingleStore's Vector Store seamlessly integrates withfull-text indexing based on Lucene, enabling powerful text similarity searches. Users can filter search results based on selected fields of document metadata objects, enhancing query precision.

What sets SingleStore apart is its ability to combine vector and full-text searches in various ways, offering flexibility and versatility. Whether prefiltering by text or vector similarity and selecting the most relevant data, or employing a weighted sum approach to compute a final similarity score, developers have multiple options at their disposal.

In essence, SingleStore provides a comprehensive solution for managing and querying vector data, offering unparalleled performance and flexibility for AI-driven applications.

ClassPackageJS support
SingleStoreVectorStorelangchain_singlestore
note

For the langchain-community versionSingleStoreDB (deprecated), seethev0.2 documentation.

Setup

To access SingleStore vector stores you'll need to install thelangchain-singlestore integration package.%pip install -qU "langchain-singlestore"

Initialization

To initializeSingleStoreVectorStore, you need anEmbeddings object and connection parameters for the SingleStore database.

Required Parameters:

  • embedding (Embeddings): A text embedding model.

Optional Parameters:

  • distance_strategy (DistanceStrategy): Strategy for calculating vector distances. Defaults toDOT_PRODUCT. Options:

    • DOT_PRODUCT: Computes the scalar product of two vectors.
    • EUCLIDEAN_DISTANCE: Computes the Euclidean distance between two vectors.
  • table_name (str): Name of the table. Defaults toembeddings.

  • content_field (str): Field for storing content. Defaults tocontent.

  • metadata_field (str): Field for storing metadata. Defaults tometadata.

  • vector_field (str): Field for storing vectors. Defaults tovector.

  • id_field (str): Field for storing IDs. Defaults toid.

  • use_vector_index (bool): Enables vector indexing (requires SingleStore 8.5+). Defaults toFalse.

  • vector_index_name (str): Name of the vector index. Ignored ifuse_vector_index isFalse.

  • vector_index_options (dict): Options for the vector index. Ignored ifuse_vector_index isFalse.

  • vector_size (int): Size of the vector. Required ifuse_vector_index isTrue.

  • use_full_text_search (bool): Enables full-text indexing on content. Defaults toFalse.

Connection Pool Parameters:

  • pool_size (int): Number of active connections in the pool. Defaults to5.
  • max_overflow (int): Maximum connections beyondpool_size. Defaults to10.
  • timeout (float): Connection timeout in seconds. Defaults to30.

Database Connection Parameters:

  • host (str): Hostname, IP, or URL for the database.
  • user (str): Database username.
  • password (str): Database password.
  • port (int): Database port. Defaults to3306.
  • database (str): Database name.

Additional Options:

  • pure_python (bool): Enables pure Python mode.
  • local_infile (bool): Allows local file uploads.
  • charset (str): Character set for string values.
  • ssl_key,ssl_cert,ssl_ca (str): Paths to SSL files.
  • ssl_disabled (bool): Disables SSL.
  • ssl_verify_cert (bool): Verifies server's certificate.
  • ssl_verify_identity (bool): Verifies server's identity.
  • autocommit (bool): Enables autocommits.
  • results_type (str): Structure of query results (e.g.,tuples,dicts).
import os

from langchain_singlestore.vectorstoresimport SingleStoreVectorStore

os.environ["SINGLESTOREDB_URL"]="root:pass@localhost:3306/db"

vector_store= SingleStoreVectorStore(embeddings=embeddings)

Manage vector store

TheSingleStoreVectorStore assumes that a Document's ID is an integer. Below are examples of how to manage the vector store.

Add items to vector store

You can add documents to the vector store as follows:

%pip install-qU langchain-core
from langchain_core.documentsimport Document

docs=[
Document(
page_content="""In the parched desert, a sudden rainstorm brought relief,
as the droplets danced upon the thirsty earth, rejuvenating the landscape
with the sweet scent of petrichor.""",
metadata={"category":"rain"},
),
Document(
page_content="""Amidst the bustling cityscape, the rain fell relentlessly,
creating a symphony of pitter-patter on the pavement, while umbrellas
bloomed like colorful flowers in a sea of gray.""",
metadata={"category":"rain"},
),
Document(
page_content="""High in the mountains, the rain transformed into a delicate
mist, enveloping the peaks in a mystical veil, where each droplet seemed to
whisper secrets to the ancient rocks below.""",
metadata={"category":"rain"},
),
Document(
page_content="""Blanketing the countryside in a soft, pristine layer, the
snowfall painted a serene tableau, muffling the world in a tranquil hush
as delicate flakes settled upon the branches of trees like nature's own
lacework.""",
metadata={"category":"snow"},
),
Document(
page_content="""In the urban landscape, snow descended, transforming
bustling streets into a winter wonderland, where the laughter of
children echoed amidst the flurry of snowballs and the twinkle of
holiday lights.""",
metadata={"category":"snow"},
),
Document(
page_content="""Atop the rugged peaks, snow fell with an unyielding
intensity, sculpting the landscape into a pristine alpine paradise,
where the frozen crystals shimmered under the moonlight, casting a
spell of enchantment over the wilderness below.""",
metadata={"category":"snow"},
),
]


vector_store.add_documents(docs)
API Reference:Document

Update items in vector store

To update an existing document in the vector store, use the following code:

updated_document= Document(
page_content="qux", metadata={"source":"https://another-example.com"}
)

vector_store.update_documents(document_id="1", document=updated_document)

Delete items from vector store

To delete documents from the vector store, use the following code:

vector_store.delete(ids=["3"])

Query vector store

Once your vector store has been created and the relevant documents have been added you will most likely wish to query it during the running of your chain or agent.

Query directly

Performing a simple similarity search can be done as follows:

results= vector_store.similarity_search(query="trees in the snow", k=1)
for docin results:
print(f"*{doc.page_content} [{doc.metadata}]")

If you want to execute a similarity search and receive the corresponding scores you can run:

  • TODO: Edit and then run code cell to generate output
results= vector_store.similarity_search_with_score(query="trees in the snow", k=1)
for doc, scorein results:
print(f"* [SIM={score:3f}]{doc.page_content} [{doc.metadata}]")

Metadata filtering

SingleStoreDB elevates search capabilities by enabling users to enhance and refine search results through prefiltering based on metadata fields. This functionality empowers developers and data analysts to fine-tune queries, ensuring that search results are precisely tailored to their requirements. By filtering search results using specific metadata attributes, users can narrow down the scope of their queries, focusing only on relevant data subsets.

query="trees branches"
docs= vector_store.similarity_search(
query,filter={"category":"snow"}
)# Find documents that correspond to the query and has category "snow"
print(docs[0].page_content)

Vector index

Enhance your search efficiency with SingleStore DB version 8.5 or above by leveragingANN vector indexes. By settinguse_vector_index=True during vector store object creation, you can activate this feature. Additionally, if your vectors differ in dimensionality from the default OpenAI embedding size of 1536, ensure to specify thevector_size parameter accordingly.

Search strategies

SingleStoreDB presents a diverse range of search strategies, each meticulously crafted to cater to specific use cases and user preferences. The defaultVECTOR_ONLY strategy utilizes vector operations such asdot_product oreuclidean_distance to calculate similarity scores directly between vectors, whileTEXT_ONLY employs Lucene-based full-text search, particularly advantageous for text-centric applications. For users seeking a balanced approach,FILTER_BY_TEXT first refines results based on text similarity before conducting vector comparisons, whereasFILTER_BY_VECTOR prioritizes vector similarity, filtering results before assessing text similarity for optimal matches. Notably, bothFILTER_BY_TEXT andFILTER_BY_VECTOR necessitate a full-text index for operation. Additionally,WEIGHTED_SUM emerges as a sophisticated strategy, calculating the final similarity score by weighing vector and text similarities, albeit exclusively utilizing dot_product distance calculations and also requiring a full-text index. These versatile strategies empower users to fine-tune searches according to their unique needs, facilitating efficient and precise data retrieval and analysis. Moreover, SingleStoreDB's hybrid approaches, exemplified byFILTER_BY_TEXT,FILTER_BY_VECTOR, andWEIGHTED_SUM strategies, seamlessly blend vector and text-based searches to maximize efficiency and accuracy, ensuring users can fully leverage the platform's capabilities for a wide range of applications.

from langchain_singlestore.vectorstoresimport DistanceStrategy

docsearch= SingleStoreVectorStore.from_documents(
docs,
embeddings,
distance_strategy=DistanceStrategy.DOT_PRODUCT,# Use dot product for similarity search
use_vector_index=True,# Use vector index for faster search
use_full_text_search=True,# Use full text index
)

vectorResults= docsearch.similarity_search(
"rainstorm in parched desert, rain",
k=1,
search_strategy=SingleStoreVectorStore.SearchStrategy.VECTOR_ONLY,
filter={"category":"rain"},
)
print(vectorResults[0].page_content)

textResults= docsearch.similarity_search(
"rainstorm in parched desert, rain",
k=1,
search_strategy=SingleStoreVectorStore.SearchStrategy.TEXT_ONLY,
)
print(textResults[0].page_content)

filteredByTextResults= docsearch.similarity_search(
"rainstorm in parched desert, rain",
k=1,
search_strategy=SingleStoreVectorStore.SearchStrategy.FILTER_BY_TEXT,
filter_threshold=0.1,
)
print(filteredByTextResults[0].page_content)

filteredByVectorResults= docsearch.similarity_search(
"rainstorm in parched desert, rain",
k=1,
search_strategy=SingleStoreVectorStore.SearchStrategy.FILTER_BY_VECTOR,
filter_threshold=0.1,
)
print(filteredByVectorResults[0].page_content)

weightedSumResults= docsearch.similarity_search(
"rainstorm in parched desert, rain",
k=1,
search_strategy=SingleStoreVectorStore.SearchStrategy.WEIGHTED_SUM,
text_weight=0.2,
vector_weight=0.8,
)
print(weightedSumResults[0].page_content)

Query by turning into retriever

You can also transform the vector store into a retriever for easier usage in your chains.

retriever= vector_store.as_retriever(search_kwargs={"k":1})
retriever.invoke("trees in the snow")

Multi-modal Example: Leveraging CLIP and OpenClip Embeddings

In the realm of multi-modal data analysis, the integration of diverse information types like images and text has become increasingly crucial. One powerful tool facilitating such integration isCLIP, a cutting-edge model capable of embedding both images and text into a shared semantic space. By doing so, CLIP enables the retrieval of relevant content across different modalities through similarity search.

To illustrate, let's consider an application scenario where we aim to effectively analyze multi-modal data. In this example, we harness the capabilities ofOpenClip multimodal embeddings, which leverage CLIP's framework. With OpenClip, we can seamlessly embed textual descriptions alongside corresponding images, enabling comprehensive analysis and retrieval tasks. Whether it's identifying visually similar images based on textual queries or finding relevant text passages associated with specific visual content, OpenClip empowers users to explore and extract insights from multi-modal data with remarkable efficiency and accuracy.

%pip install-U langchain openai lanchain-singlestore langchain-experimental
import os

from langchain_experimental.open_clipimport OpenCLIPEmbeddings
from langchain_singlestore.vectorstoresimport SingleStoreVectorStore

os.environ["SINGLESTOREDB_URL"]="root:pass@localhost:3306/db"

TEST_IMAGES_DIR="../../modules/images"

docsearch= SingleStoreVectorStore(OpenCLIPEmbeddings())

image_uris=sorted(
[
os.path.join(TEST_IMAGES_DIR, image_name)
for image_namein os.listdir(TEST_IMAGES_DIR)
if image_name.endswith(".jpg")
]
)

# Add images
docsearch.add_images(uris=image_uris)
API Reference:OpenCLIPEmbeddings

Usage for retrieval-augmented generation

For guides on how to use this vector store for retrieval-augmented generation (RAG), see the following sections:

API reference

For detailed documentation of all SingleStore Document Loader features and configurations head to the github page:https://github.com/singlestore-labs/langchain-singlestore/

Related


[8]ページ先頭

©2009-2025 Movatter.jp