Movatterモバイル変換


[0]ホーム

URL:


Skip to content

LangChain 🦜️🔗↗

LangChain is a framework designed for building applications with large language models (LLMs) by chaining together various components. It supports a range of functionalities including memory, agents, and chat models, enabling developers to create context-aware applications.

Illustration

LangChain streamlines these stages (in figure above) by providing pre-built components and tools for integration, memory management, and deployment, allowing developers to focus on application logic rather than underlying complexities.

Integration ofLangchain withLanceDB enables applications to retrieve the most relevant data by comparing query vectors against stored vectors, facilitating effective information retrieval. It results in better and context aware replies and actions by the LLMs.

Quick Start

You can load your document data using langchain's loaders, for this example we are usingTextLoader andOpenAIEmbeddings as the embedding model. Checkout Complete example here -LangChain demo

importosfromlangchain.document_loadersimportTextLoaderfromlangchain.vectorstoresimportLanceDBfromlangchain_openaiimportOpenAIEmbeddingsfromlangchain_text_splittersimportCharacterTextSplitteros.environ["OPENAI_API_KEY"]="sk-..."loader=TextLoader("../../modules/state_of_the_union.txt")# Replace with your data pathdocuments=loader.load()documents=CharacterTextSplitter().split_documents(documents)embeddings=OpenAIEmbeddings()docsearch=LanceDB.from_documents(documents,embeddings)query="What did the president say about Ketanji Brown Jackson"docs=docsearch.similarity_search(query)print(docs[0].page_content)

Documentation

In the above exampleLanceDB vector store class object is created usingfrom_documents() method which is aclassmethod and returns the initialized class object.

You can also useLanceDB.from_texts(texts: List[str],embedding: Embeddings) class method.

The exhaustive list of parameters forLanceDB vector store are :

NametypePurposedefault
connection(Optional)Anylancedb.db.LanceDBConnection connection object to use. If not provided, a new connection will be created.None
embedding(Optional)EmbeddingsLangchain embedding model.Provided by user.
uri(Optional)strIt specifies the directory location ofLanceDB database and establishes a connection that can be used to interact with the database./tmp/lancedb
vector_key(Optional)strColumn name to use for vector's in the table.'vector'
id_key(Optional)strColumn name to use for id's in the table.'id'
text_key(Optional)strColumn name to use for text in the table.'text'
table_name(Optional)strName of your table in the database.'vectorstore'
api_key(Optionalstr)API key to use for LanceDB cloud database.None
region(Optional)strRegion to use for LanceDB cloud database.Only for LanceDB Cloud :None.
mode(Optional)strMode to use for adding data to the table. Valid values are "append" and "overwrite".'overwrite'
table(Optional)AnyYou can connect to an existing table of LanceDB, created outside of langchain, and utilize it.None
distance(Optional)strThe choice of distance metric used to calculate the similarity between vectors.'l2'
reranker(Optional)AnyThe reranker to use for LanceDB.None
relevance_score_fn(Optional)Callable[[float], float]Langchain relevance score function to be used.None
limitintSet the maximum number of results to return.DEFAULT_K (it is 4)
db_url="db://lang_test"# url of db you createdapi_key="xxxxx"# your API keyregion="us-east-1-dev"# your selected regionvector_store=LanceDB(uri=db_url,api_key=api_key,#(dont include for local API)region=region,#(dont include for local API)embedding=embeddings,table_name='langchain_test'# Optional)

Methods

add_texts()

This method turn texts into embedding and add it to the database.

NamePurposedefaults
textsIterable of strings to add to the vectorstore.Provided by user
metadatasOptionallist[dict()] of metadatas associated with the texts.None
idsOptionallist of ids to associate with the texts.None
kwargsOther keyworded arguments provided by the user.-

It returns list of ids of the added texts.

vector_store.add_texts(texts=['test_123'],metadatas=[{'source':'wiki'}])#Additionaly, to explore the table you can load it into a df or save it in a csv file:tbl=vector_store.get_table()print("tbl:",tbl)pd_df=tbl.to_pandas()pd_df.to_csv("docsearch.csv",index=False)# you can also create a new vector store object using an older connection object:vector_store=LanceDB(connection=tbl,embedding=embeddings)

create_index()

This method creates a scalar(for non-vector cols) or a vector index on a table.

NametypePurposedefaults
vector_colOptional[str]Provide if you want to create index on a vector column.None
col_nameOptional[str]Provide if you want to create index on a non-vector column.None
metricOptional[str]Provide the metric to use for vector index. choice of metrics: 'l2', 'dot', 'cosine'.l2
num_partitionsOptional[int]Number of partitions to use for the index.256
num_sub_vectorsOptional[int]Number of sub-vectors to use for the index.96
index_cache_sizeOptional[int]Size of the index cache.None
nameOptional[str]Name of the table to create index on.None

For index creation make sure your table has enough data in it. An ANN index is ususally not needed for datasets ~100K vectors. For large-scale (>1M) or higher dimension vectors, it is beneficial to create an ANN index.

# for creating vector indexvector_store.create_index(vector_col='vector',metric='cosine')# for creating scalar index(for non-vector columns)vector_store.create_index(col_name='text')

similarity_search()

This method performs similarity search based ontext query.

NameTypePurposeDefault
querystrAstr representing the text query that you want to search for in the vector store.N/A
kOptional[int]It specifies the number of documents to return.None
filterOptional[Dict[str, str]]It is used to filter the search results by specific metadata criteria.None
ftsOptional[bool]It indicates whether to perform a full-text search (FTS).False
nameOptional[str]It is used for specifying the name of the table to query. If not provided, it uses the default table set during the initialization of the LanceDB instance.None
kwargsAnyOther keyworded arguments provided by the user.N/A

Return documents most similar to the querywithout relevance scores.

docs=docsearch.similarity_search(query)print(docs[0].page_content)

similarity_search_by_vector()

The method returns documents that are most similar to the specifiedembedding (query) vector.

NameTypePurposeDefault
embeddingList[float]The embedding vector you want to use to search for similar documents in the vector store.N/A
kOptional[int]It specifies the number of documents to return.None
filterOptional[Dict[str, str]]It is used to filter the search results by specific metadata criteria.None
nameOptional[str]It is used for specifying the name of the table to query. If not provided, it uses the default table set during the initialization of the LanceDB instance.None
kwargsAnyOther keyworded arguments provided by the user.N/A

It does not provide relevance scores.

docs=docsearch.similarity_search_by_vector(query)print(docs[0].page_content)

similarity_search_with_score()

Returns documents most similar to thequery string along with their relevance scores.

NameTypePurposeDefault
querystrAstr representing the text query you want to search for in the vector store. This query will be converted into an embedding using the specified embedding function.N/A
kOptional[int]It specifies the number of documents to return.None
filterOptional[Dict[str, str]]It is used to filter the search results by specific metadata criteria. This allows you to narrow down the search results based on certain metadata attributes associated with the documents.None
kwargsAnyOther keyworded arguments provided by the user.N/A

It gets called by base class'ssimilarity_search_with_relevance_scores which selects relevance score based on our_select_relevance_score_fn.

docs=docsearch.similarity_search_with_relevance_scores(query)print("relevance score - ",docs[0][1])print("text- ",docs[0][0].page_content[:1000])

similarity_search_by_vector_with_relevance_scores()

Similarity search usingquery vector.

NameTypePurposeDefault
embeddingList[float]The embedding vector you want to use to search for similar documents in the vector store.N/A
kOptional[int]It specifies the number of documents to return.None
filterOptional[Dict[str, str]]It is used to filter the search results by specific metadata criteria.None
nameOptional[str]It is used for specifying the name of the table to query.None
kwargsAnyOther keyworded arguments provided by the user.N/A

The method returns documents most similar to the specified embedding (query) vector, along with their relevance scores.

docs=docsearch.similarity_search_by_vector_with_relevance_scores(query_embedding)print("relevance score - ",docs[0][1])print("text- ",docs[0][0].page_content[:1000])

max_marginal_relevance_search()

This method returns docs selected using the maximal marginal relevance(MMR).Maximal marginal relevance optimizes for similarity to query AND diversity among selected documents.

NameTypePurposeDefault
querystrText to look up documents similar to.N/A
kOptional[int]Number of Documents to return.4
fetch_kOptional[int]Number of Documents to fetch to pass to MMR algorithm.None
lambda_multfloatNumber between 0 and 1 that determines the degree of diversity among the results with 0 corresponding to maximum diversity and 1 to minimum diversity.0.5
filterOptional[Dict[str, str]]Filter by metadata.None
kwargsOther keyworded arguments provided by the user.-

Similarly,max_marginal_relevance_search_by_vector() function returns docs most similar to the embedding passed to the function using MMR. instead of a string query you need to pass the embedding to be searched for.

result=docsearch.max_marginal_relevance_search(query="text")result_texts=[doc.page_contentfordocinresult]print(result_texts)## search by vector :result=docsearch.max_marginal_relevance_search_by_vector(embeddings.embed_query("text"))result_texts=[doc.page_contentfordocinresult]print(result_texts)

add_images()

This method ddds images by automatically creating their embeddings and adds them to the vectorstore.

NameTypePurposeDefault
urisList[str]File path to the imageN/A
metadatasOptional[List[dict]]Optional list of metadatasNone
idsOptional[List[str]]Optional list of IDsNone

It returns list of IDs of the added images.

vec_store.add_images(uris=image_uris)# here image_uris are local fs paths to the images.

[8]ページ先頭

©2009-2025 Movatter.jp