Genkit

genkitx-lancedb

This is a lancedb plugin for genkit framework. It allows you to use LanceDB for ingesting and rereiving data using genkit framework.

integration-banner-genkit

Installation

pnpminstallgenkitx-lancedb

Usage

Adding LanceDB plugin to your genkit instance.

import{lancedbIndexerRef,lancedb,lancedbRetrieverRef,WriteMode}from'genkitx-lancedb';import{textEmbedding004,vertexAI}from'@genkit-ai/vertexai';import{gemini}from'@genkit-ai/vertexai';import{z,genkit}from'genkit';import{Document}from'genkit/retriever';import{chunk}from'llm-chunk';import{readFile}from'fs/promises';importpathfrom'path';importpdffrom'pdf-parse/lib/pdf-parse';constai=genkit({plugins:[// vertexAI provides the textEmbedding004 embeddervertexAI(),// the local vector store requires an embedder to translate from text to vectorlancedb([{dbUri:'.db',// optional lancedb uri, default to .dbtableName:'table',// optional table name, default to tableembedder:textEmbedding004,},]),],});

You can run this app with the following command:

genkitstart--tsx--watchsrc/index.ts

This'll add LanceDB as a retriever and indexer to the genkit instance. You can see it in the GUI view Screenshot 2025-05-11 at 7 21 05 PM

Testing retrieval on a sample tableLet's see the raw retrieval results

Screenshot 2025-05-11 at 7 21 05 PM On running this query, you'll 5 results fetched from the lancedb table, where each result looks something like this:

Creating a custom RAG flow

Now that we've seen how you can use LanceDB for in a genkit pipeline, let's refine the flow and create a RAG. A RAG flow will consist of an index and a retreiver with its outputs postprocessed an fed into an LLM for final response

Creating custom indexer flows

You can also create custom indexer flows, utilizing more options and features provided by LanceDB.

exportconstmenuPdfIndexer=lancedbIndexerRef({// Using all defaults, for dbUri, tableName, and embedder, etc});constchunkingConfig={minLength:1000,maxLength:2000,splitter:'sentence',overlap:100,delimiters:'',}asany;asyncfunctionextractTextFromPdf(filePath:string){constpdfFile=path.resolve(filePath);constdataBuffer=awaitreadFile(pdfFile);constdata=awaitpdf(dataBuffer);returndata.text;}exportconstindexMenu=ai.defineFlow({name:'indexMenu',inputSchema:z.string().describe('PDF file path'),outputSchema:z.void(),},async(filePath:string)=>{filePath=path.resolve(filePath);// Read the pdf.constpdfTxt=awaitai.run('extract-text',()=>extractTextFromPdf(filePath));// Divide the pdf text into segments.constchunks=awaitai.run('chunk-it',async()=>chunk(pdfTxt,chunkingConfig));// Convert chunks of text into documents to store in the index.constdocuments=chunks.map((text)=>{returnDocument.fromText(text,{filePath});});// Add documents to the index.awaitai.index({indexer:menuPdfIndexer,documents,options:{writeMode:WriteMode.Overwrite,}asany});});

Screenshot 2025-05-11 at 8 35 56 PM

In your console, you can see the logs

Screenshot 2025-05-11 at 7 19 14 PM

Creating custom retriever flows

You can also create custom retriever flows, utilizing more options and features provided by LanceDB.

exportconstmenuRetriever=lancedbRetrieverRef({tableName:"table",// Use the same table name as the indexer.displayName:"Menu",// Use a custom display name.exportconstmenuQAFlow=ai.defineFlow({name:"Menu",inputSchema:z.string(),outputSchema:z.string()},async(input:string)=>{// retrieve relevant documentsconstdocs=awaitai.retrieve({retriever:menuRetriever,query:input,options:{k:3,},});constextractedContent=docs.map(doc=>{if(doc.content&&Array.isArray(doc.content)&&doc.content.length>0){if(doc.content[0].media&&doc.content[0].media.url){returndoc.content[0].media.url;}}return"No content found";});console.log("Extracted content:",extractedContent);const{text}=awaitai.generate({model:gemini('gemini-2.0-flash'),prompt:`You are acting as a helpful AI assistant that can answerquestions about the food available on the menu at Genkit Grub Pub.Use only the context provided to answer the question.If you don't know, do not make up an answer.Do not add or change items on the menu.Context:${extractedContent.join('\n\n')}Question:${input}`,docs,});returntext;});

Now using our retrieval flow, we can ask question about the ingsted PDF Screenshot 2025-05-11 at 7 18 45 PM

Movatterモバイル変換