DingoDB
DingoDB is a distributed multi-mode vector database, which combines the characteristics of data lakes and vector databases, and can store data of any type and size (Key-Value, PDF, audio, video, etc.). It has real-time low-latency processing capabilities to achieve rapid insight and response, and can efficiently conduct instant analysis and process multi-modal data.
You'll need to installlangchain-community
withpip install -qU langchain-community
to use this integration
This notebook shows how to use functionality related to the DingoDB vector database.
To run, you should have aDingoDB instance up and running.
%pip install--upgrade--quiet dingodb
# or install latest:
%pip install--upgrade--quiet git+https://git@github.com/dingodb/pydingo.git
We want to use OpenAIEmbeddings so we have to get the OpenAI API Key.
import getpass
import os
if"OPENAI_API_KEY"notin os.environ:
os.environ["OPENAI_API_KEY"]= getpass.getpass("OpenAI API Key:")
OpenAI API Key:········
from langchain_community.document_loadersimport TextLoader
from langchain_community.vectorstoresimport Dingo
from langchain_openaiimport OpenAIEmbeddings
from langchain_text_splittersimport CharacterTextSplitter
from langchain_community.document_loadersimport TextLoader
loader= TextLoader("../../how_to/state_of_the_union.txt")
documents= loader.load()
text_splitter= CharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
docs= text_splitter.split_documents(documents)
embeddings= OpenAIEmbeddings()
from dingodbimport DingoDB
index_name="langchain_demo"
dingo_client= DingoDB(user="", password="", host=["127.0.0.1:13000"])
# First, check if our index already exists. If it doesn't, we create it
if(
index_namenotin dingo_client.get_index()
and index_name.upper()notin dingo_client.get_index()
):
# we create a new index, modify to your own
dingo_client.create_index(
index_name=index_name, dimension=1536, metric_type="cosine", auto_id=False
)
# The OpenAI embedding model `text-embedding-ada-002 uses 1536 dimensions`
docsearch= Dingo.from_documents(
docs, embeddings, client=dingo_client, index_name=index_name
)
from langchain_community.document_loadersimport TextLoader
from langchain_community.vectorstoresimport Dingo
from langchain_openaiimport OpenAIEmbeddings
from langchain_text_splittersimport CharacterTextSplitter
query="What did the president say about Ketanji Brown Jackson"
docs= docsearch.similarity_search(query)
print(docs[0].page_content)
Adding More Text to an Existing Index
More text can embedded and upserted to an existing Dingo index using theadd_texts
function
vectorstore= Dingo(embeddings,"text", client=dingo_client, index_name=index_name)
vectorstore.add_texts(["More text!"])
Maximal Marginal Relevance Searches
In addition to using similarity search in the retriever object, you can also usemmr
as retriever.
retriever= docsearch.as_retriever(search_type="mmr")
matched_docs= retriever.invoke(query)
for i, dinenumerate(matched_docs):
print(f"\n## Document{i}\n")
print(d.page_content)
Or usemax_marginal_relevance_search
directly:
found_docs= docsearch.max_marginal_relevance_search(query, k=2, fetch_k=10)
for i, docinenumerate(found_docs):
print(f"{i+1}.", doc.page_content,"\n")
Related
- Vector storeconceptual guide
- Vector storehow-to guides