Genkit
genkitx-lancedb
This is a lancedb plugin for genkit framework. It allows you to use LanceDB for ingesting and rereiving data using genkit framework.
Installation
Usage
Adding LanceDB plugin to your genkit instance.
import{lancedbIndexerRef,lancedb,lancedbRetrieverRef,WriteMode}from'genkitx-lancedb';import{textEmbedding004,vertexAI}from'@genkit-ai/vertexai';import{gemini}from'@genkit-ai/vertexai';import{z,genkit}from'genkit';import{Document}from'genkit/retriever';import{chunk}from'llm-chunk';import{readFile}from'fs/promises';importpathfrom'path';importpdffrom'pdf-parse/lib/pdf-parse';constai=genkit({plugins:[// vertexAI provides the textEmbedding004 embeddervertexAI(),// the local vector store requires an embedder to translate from text to vectorlancedb([{dbUri:'.db',// optional lancedb uri, default to .dbtableName:'table',// optional table name, default to tableembedder:textEmbedding004,},]),],});
You can run this app with the following command:
This'll add LanceDB as a retriever and indexer to the genkit instance. You can see it in the GUI view
Testing retrieval on a sample tableLet's see the raw retrieval results
On running this query, you'll 5 results fetched from the lancedb table, where each result looks something like this:
Creating a custom RAG flow
Now that we've seen how you can use LanceDB for in a genkit pipeline, let's refine the flow and create a RAG. A RAG flow will consist of an index and a retreiver with its outputs postprocessed an fed into an LLM for final response
Creating custom indexer flows
You can also create custom indexer flows, utilizing more options and features provided by LanceDB.
exportconstmenuPdfIndexer=lancedbIndexerRef({// Using all defaults, for dbUri, tableName, and embedder, etc});constchunkingConfig={minLength:1000,maxLength:2000,splitter:'sentence',overlap:100,delimiters:'',}asany;asyncfunctionextractTextFromPdf(filePath:string){constpdfFile=path.resolve(filePath);constdataBuffer=awaitreadFile(pdfFile);constdata=awaitpdf(dataBuffer);returndata.text;}exportconstindexMenu=ai.defineFlow({name:'indexMenu',inputSchema:z.string().describe('PDF file path'),outputSchema:z.void(),},async(filePath:string)=>{filePath=path.resolve(filePath);// Read the pdf.constpdfTxt=awaitai.run('extract-text',()=>extractTextFromPdf(filePath));// Divide the pdf text into segments.constchunks=awaitai.run('chunk-it',async()=>chunk(pdfTxt,chunkingConfig));// Convert chunks of text into documents to store in the index.constdocuments=chunks.map((text)=>{returnDocument.fromText(text,{filePath});});// Add documents to the index.awaitai.index({indexer:menuPdfIndexer,documents,options:{writeMode:WriteMode.Overwrite,}asany});});
In your console, you can see the logs
Creating custom retriever flows
You can also create custom retriever flows, utilizing more options and features provided by LanceDB.
exportconstmenuRetriever=lancedbRetrieverRef({tableName:"table",// Use the same table name as the indexer.displayName:"Menu",// Use a custom display name.exportconstmenuQAFlow=ai.defineFlow({name:"Menu",inputSchema:z.string(),outputSchema:z.string()},async(input:string)=>{// retrieve relevant documentsconstdocs=awaitai.retrieve({retriever:menuRetriever,query:input,options:{k:3,},});constextractedContent=docs.map(doc=>{if(doc.content&&Array.isArray(doc.content)&&doc.content.length>0){if(doc.content[0].media&&doc.content[0].media.url){returndoc.content[0].media.url;}}return"No content found";});console.log("Extracted content:",extractedContent);const{text}=awaitai.generate({model:gemini('gemini-2.0-flash'),prompt:`You are acting as a helpful AI assistant that can answerquestions about the food available on the menu at Genkit Grub Pub.Use only the context provided to answer the question.If you don't know, do not make up an answer.Do not add or change items on the menu.Context:${extractedContent.join('\n\n')}Question:${input}`,docs,});returntext;});