Llama-Index
Quick start
You would need to install the integration viapip install llama-index-vector-stores-lancedb
in order to use it. You can run the below script to try it out :
importloggingimportsys# Uncomment to see debug logs# logging.basicConfig(stream=sys.stdout, level=logging.DEBUG)# logging.getLogger().addHandler(logging.StreamHandler(stream=sys.stdout))fromllama_index.coreimportSimpleDirectoryReader,Document,StorageContextfromllama_index.coreimportVectorStoreIndexfromllama_index.vector_stores.lancedbimportLanceDBVectorStoreimporttextwrapimportopenaiopenai.api_key="sk-..."documents=SimpleDirectoryReader("./data/your-data-dir/").load_data()print("Document ID:",documents[0].doc_id,"Document Hash:",documents[0].hash)## For LanceDB cloud :# vector_store = LanceDBVectorStore(# uri="db://db_name", # your remote DB URI# api_key="sk_..", # lancedb cloud api key# region="your-region" # the region you configured# ...# )vector_store=LanceDBVectorStore(uri="./lancedb",mode="overwrite",query_type="vector")storage_context=StorageContext.from_defaults(vector_store=vector_store)index=VectorStoreIndex.from_documents(documents,storage_context=storage_context)lance_filter="metadata.file_name = 'paul_graham_essay.txt' "retriever=index.as_retriever(vector_store_kwargs={"where":lance_filter})response=retriever.retrieve("What did the author do growing up?")
Checkout Complete example here -LlamaIndex demo
Filtering
For metadata filtering, you can use a Lance SQL-like string filter as demonstrated in the example above. Additionally, you can also filter using theMetadataFilters
class from LlamaIndex:
fromllama_index.core.vector_storesimport(MetadataFilters,FilterOperator,FilterCondition,MetadataFilter,)query_filters=MetadataFilters(filters=[MetadataFilter(key="creation_date",operator=FilterOperator.EQ,value="2024-05-23"),MetadataFilter(key="file_size",value=75040,operator=FilterOperator.GT),],condition=FilterCondition.AND,)
Hybrid Search
For complete documentation, referhere. This example uses thecolbert
reranker. Make sure to install necessary dependencies for the reranker you choose.
fromlancedb.rerankersimportColbertRerankerreranker=ColbertReranker()vector_store._add_reranker(reranker)query_engine=index.as_query_engine(filters=query_filters,vector_store_kwargs={"query_type":"hybrid",})response=query_engine.query("How much did Viaweb charge per month?")
In the above snippet, you can change/specify query_type again when creating the engine/retriever.
API reference
The exhaustive list of parameters forLanceDBVectorStore
vector store are :
-connection
: Optional,lancedb.db.LanceDBConnection
connection object to use. If not provided, a new connection will be created.-uri
: Optional[str], the uri of your database. Defaults to"/tmp/lancedb"
.-table_name
: Optional[str], Name of your table in the database. Defaults to"vectors"
.-table
: Optional[Any],lancedb.db.LanceTable
object to be passed. Defaults toNone
. -vector_column_name
: Optional[Any], Column name to use for vector's in the table. Defaults to'vector'
.
-doc_id_key
: Optional[str], Column name to use for document id's in the table. Defaults to'doc_id'
.
-text_key
: Optional[str], Column name to use for text in the table. Defaults to'text'
.
-api_key
: Optional[str], API key to use for LanceDB cloud database. Defaults toNone
.
-region
: Optional[str], Region to use for LanceDB cloud database. Only for LanceDB Cloud, defaults toNone
.
-nprobes
: Optional[int], Set the number of probes to use. Only applicable if ANN index is created on the table else its ignored. Defaults to20
.-refine_factor
: Optional[int], Refine the results by reading extra elements and re-ranking them in memory. Defaults toNone
.-reranker
: Optional[Any], The reranker to use for LanceDB. Defaults toNone
.-overfetch_factor
: Optional[int], The factor by which to fetch more results. Defaults to1
.-mode
: Optional[str], The mode to use for LanceDB. Defaults to"overwrite"
.-query_type
:Optional[str], The type of query to use for LanceDB. Defaults to"vector"
.
Methods
from_table(cls, table: lancedb.db.LanceTable) ->
LanceDBVectorStore
: (class method) Creates instance from lancedb table._add_reranker(self, reranker: lancedb.rerankers.Reranker) ->
None
: Add a reranker to an existing vector store.- Usage :
- _table_exists(self, tbl_name:
Optional[str]
=None
) ->bool
: ReturnsTrue
iftbl_name
exists in database. create_index(
self, scalar:Optional[bool]
= False, col_name:Optional[str]
= None, num_partitions:Optional[int]
= 256, num_sub_vectors:Optional[int]
= 96, index_cache_size:Optional[int]
= None, metric:Optional[str]
= "l2",
) ->None
: Creates a scalar(for non-vector cols) or a vector index on a table. Make sure your vector column has enough data before creating an index on it.add(self, nodes:
List[BaseNode]
, **add_kwargs:Any
, ) ->List[str]
:adds Nodes to the tabledelete(self, ref_doc_id:
str
) ->None
: Delete nodes using with node_ids.- delete_nodes(self, node_ids:
List[str]
) ->None
: Delete nodes using with node_ids. - query( self, query:
VectorStoreQuery
, **kwargs:Any
, ) ->VectorStoreQueryResult
: Query index(VectorStoreIndex
) for top k most similar nodes. Accepts llamaIndexVectorStoreQuery
object.