LumoKit is a lightweight Swift library forRetrieval-Augmented Generation (RAG) systems. It integrates withPicoDocs for document parsing andVecturaKit for semantic search and vector storage.
The nameLumoKit is derived from the Chinese characters流 (liú) meaning "flow" and模 (mó) meaning "model." It symbolizes the idea offlowing information through a model, reflecting data retrieval for a large language model.
- Parse and Chunk Documents: Use
PicoDocs
to extract content from files and split them into manageable chunks for efficient indexing. - Semantic Search: Perform similarity-based searches using
VecturaKit
's vector database. - Configurable Document Indexing: Set custom chunk sizes to control how documents are segmented for retrieval.
- Reset Database: Quickly reset the vector database to start fresh with new data.
Add the following dependencies to yourPackage.swift
file:
dependencies:[.package(url:"https://github.com/rryam/LumoKit.git", from:"0.1.0"),],
Then import the package in your project:
- Initialize LumoKit
First, set up the configuration for VecturaKit and initialize LumoKit:
import LumoKitimport VecturaKitletconfig=VecturaConfig( name:"my-vector-db", dimension:384, searchOptions:VecturaConfig.SearchOptions( defaultNumResults:10, minThreshold:0.7))letlumoKit=tryLumoKit(config: config)
- Parse and Index Documents
Parse a file and index its content into the vector database:
letfileURL=URL(fileURLWithPath:"/path/to/your/document.pdf")tryawait lumoKit.parseAndIndex(url: fileURL, chunkSize:500)
- Perform Semantic Search
Search for relevant documents by querying the indexed database:
letresults=tryawait lumoKit.semanticSearch(query:"What is Swift?", numResults:5, threshold:0.7)forresultin results{print("Document ID:\(result.id)")print("Text:\(result.text)")print("Score:\(result.score)")}
- Document Parsing: Leverages PicoDocs to parse various file formats (e.g., PDF, Markdown).
- Chunking: Splits the content into smaller chunks for efficient indexing.
- Vector Storage: Uses VecturaKit to store embeddings and perform similarity searches.
- Semantic Search: Retrieves the most relevant chunks for a given query.
letfileURL=URL(fileURLWithPath:"/path/to/document.pdf")// Parse and index documenttryawait lumoKit.parseAndIndex(url: fileURL, chunkSize:500)// Perform semantic searchletquery="Explain the importance of vector databases."letresults=tryawait lumoKit.semanticSearch(query: query)forresultin results{print("Relevant Text:\(result.text)")}// Reset the databasetryawait lumoKit.resetDB()
Contributions are welcome! Please fork the repository and submit a pull request with your improvements or suggestions.
LumoKit is licensed under the MIT License. See the LICENSE file for more details.
- PicoDocs: For powerful document parsing.
- VecturaKit: For robust vector database functionality.