superlinear-ai/raglitePublic

NotificationsYou must be signed in to change notification settings
Fork94
Star1k

🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL

License

MPL-2.0 license

1k stars 94 forks Branches Tags Activity

Star

Notifications

You must be signed in to change notification settings

Branches Tags

Folders and files

Name		Name	Last commit message	Last commit date
Latest commit History 104 Commits
.devcontainer		.devcontainer
.github		.github
src/raglite		src/raglite
tests		tests
.copier-answers.yml		.copier-answers.yml
.dockerignore		.dockerignore
.gitignore		.gitignore
.pre-commit-config.yaml		.pre-commit-config.yaml
CHANGELOG.md		CHANGELOG.md
Dockerfile		Dockerfile
LICENSE		LICENSE
README.md		README.md
docker-compose.yml		docker-compose.yml
pyproject.toml		pyproject.toml

Repository files navigation

🥤 RAGLite

RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL.

Features

Configurable

🧠 Choose any LLM provider withLiteLLM, including localllama-cpp-python models
💾 Choose eitherDuckDB orPostgreSQL as a keyword & vector search database
🥇 Choose any reranker withrerankers, including multilingualFlashRank as the default

Fast and permissive

❤️ Only lightweight and permissive open source dependencies (e.g., noPyTorch orLangChain)
🚀 Acceleration with Metal on macOS, and CUDA on Linux and Windows

Unhobbled

📖 PDF to Markdown conversion on top ofpdftext andpypdfium2
🧬 Multi-vector chunk embedding withlate chunking andcontextual chunk headings
✏️ Optimal sentence splitting withwtpsplit-lite by solving abinary integer programming problem
✂️ Optimalsemantic chunking by solving abinary integer programming problem
🔍Hybrid search with the database's native keyword & vector search (FTS+VSS;tsvector+pgvector)
💭Adaptive retrieval where the LLM decides whether to and what to retrieve based on the query
💰 Improved cost and latency with aprompt caching-aware message array structure
🍰 Improved output quality withAnthropic's long-context prompt format
🌀 Optimalclosed-form linear query adapter by solving anorthogonal Procrustes problem

Extensible

🔌 A built-inModel Context Protocol (MCP) server that any MCP client likeClaude desktop can connect with
💬 Optional customizable ChatGPT-like frontend forweb,Slack, andTeams withChainlit
✍️ Optional conversion of any input document to Markdown withPandoc
✅ Optional evaluation of retrieval and generation performance withRagas

Installing

Tip

🚀 If you want to use local models, it is recommended to installan accelerated llama-cpp-python precompiled binary with:

# Configure which llama-cpp-python precompiled binary to install (⚠️ not every combination is available):LLAMA_CPP_PYTHON_VERSION=0.3.9PYTHON_VERSION=310|311|312ACCELERATOR=metal|cu121|cu122|cu123|cu124PLATFORM=macosx_11_0_arm64|linux_x86_64|win_amd64# Install llama-cpp-python:pip install"https://github.com/abetlen/llama-cpp-python/releases/download/v$LLAMA_CPP_PYTHON_VERSION-$ACCELERATOR/llama_cpp_python-$LLAMA_CPP_PYTHON_VERSION-cp$PYTHON_VERSION-cp$PYTHON_VERSION-$PLATFORM.whl"

Install RAGLite with:

pip install raglite

To add support for a customizable ChatGPT-like frontend, use thechainlit extra:

pip install raglite[chainlit]

To add support for filetypes other than PDF, use thepandoc extra:

pip install raglite[pandoc]

To add support for evaluation, use theragas extra:

pip install raglite[ragas]

Using

Overview

1. Configuring RAGLite

Tip

🧠 RAGLite extendsLiteLLM with support forllama.cpp models usingllama-cpp-python. To select a llama.cpp model (e.g., fromUnsloth's collection), use a model identifier of the form"llama-cpp-python/<hugging_face_repo_id>/<filename>@<n_ctx>", wheren_ctx is an optional parameter that specifies the context size of the model.

Tip

💾 You can create a PostgreSQL database in a few clicks atneon.tech.

First, configure RAGLite with your preferred DuckDB or PostgreSQL database andany LLM supported by LiteLLM:

fromragliteimportRAGLiteConfig# Example 'remote' config with a PostgreSQL database and an OpenAI LLM:my_config=RAGLiteConfig(db_url="postgresql://my_username:my_password@my_host:5432/my_database",llm="gpt-4o-mini",# Or any LLM supported by LiteLLMembedder="text-embedding-3-large",# Or any embedder supported by LiteLLM)# Example 'local' config with a DuckDB database and a llama.cpp LLM:my_config=RAGLiteConfig(db_url="duckdb:///raglite.db",llm="llama-cpp-python/unsloth/Qwen3-8B-GGUF/*Q4_K_M.gguf@8192",embedder="llama-cpp-python/lm-kit/bge-m3-gguf/*F16.gguf@512",# More than 512 tokens degrades bge-m3's performance)

You can also configureany reranker supported by rerankers:

fromrerankersimportReranker# Example remote API-based reranker:my_config=RAGLiteConfig(db_url="postgresql://my_username:my_password@my_host:5432/my_database"reranker=Reranker("rerank-v3.5",model_type="cohere",api_key=COHERE_API_KEY,verbose=0)# Multilingual)# Example local cross-encoder reranker per language (this is the default):my_config=RAGLiteConfig(db_url="duckdb:///raglite.db",reranker={"en":Reranker("ms-marco-MiniLM-L-12-v2",model_type="flashrank",verbose=0),# English"other":Reranker("ms-marco-MultiBERT-L-12",model_type="flashrank",verbose=0),# Other languages    })

2. Inserting documents

Tip

✍️ To insert documents other than PDF, install thepandoc extra withpip install raglite[pandoc].

Next, insert some documents into the database. RAGLite will take care of theconversion to Markdown,optimal level 4 semantic chunking, andmulti-vector embedding with late chunking:

# Insert documents given their file pathfrompathlibimportPathfromragliteimportDocument,insert_documentsdocuments= [Document.from_path(Path("On the Measure of Intelligence.pdf")),Document.from_path(Path("Special Relativity.pdf")),]insert_documents(documents,config=my_config)# Insert documents given their text/plain or text/markdown contentcontent="""# ON THE ELECTRODYNAMICS OF MOVING BODIES## By A. EINSTEIN  June 30, 1905It is known that Maxwell..."""documents= [Document.from_text(content)]insert_documents(documents,config=my_config)

3. Retrieval-Augmented Generation (RAG)

3.1 Adaptive RAG

Now you can run an adaptive RAG pipeline that consists of adding the user prompt to the message history and streaming the LLM response:

fromragliteimportrag# Create a user messagemessages= []# Or start with an existing message historymessages.append({"role":"user","content":"How is intelligence measured?"})# Adaptively decide whether to retrieve and then stream the responsechunk_spans= []stream=rag(messages,on_retrieval=lambdax:chunk_spans.extend(x),config=my_config)forupdateinstream:print(update,end="")# Access the documents referenced in the RAG contextdocuments= [chunk_span.documentforchunk_spaninchunk_spans]

The LLM will adaptively decide whether to retrieve information based on the complexity of the user prompt. If retrieval is necessary, the LLM generates the search query and RAGLite applies hybrid search and reranking to retrieve the most relevant chunk spans (each of which is a list of consecutive chunks). The retrieval results are sent to theon_retrieval callback and are appended to the message history as a tool output. Finally, the assistant response is streamed and appended to the message history.

3.2 Programmable RAG

If you need manual control over the RAG pipeline, you can run a basic but powerful pipeline that consists of retrieving the most relevant chunk spans with hybrid search and reranking, converting the user prompt to a RAG instruction and appending it to the message history, and finally generating the RAG response:

fromragliteimportadd_context,rag,retrieve_context,vector_search# Choose a search methodfromdataclassesimportreplacemy_config=replace(my_config,search_method=vector_search)# Or `hybrid_search`, `search_and_rerank_chunks`, ...# Retrieve relevant chunk spans with the configured search methoduser_prompt="How is intelligence measured?"chunk_spans=retrieve_context(query=user_prompt,num_chunks=5,config=my_config)# Append a RAG instruction based on the user prompt and context to the message historymessages= []# Or start with an existing message historymessages.append(add_context(user_prompt=user_prompt,context=chunk_spans))# Stream the RAG response and append it to the message historystream=rag(messages,config=my_config)forupdateinstream:print(update,end="")# Access the documents referenced in the RAG contextdocuments= [chunk_span.documentforchunk_spaninchunk_spans]

Tip

🥇 Reranking can significantly improve the output quality of a RAG application. To add reranking to your application: first search for a larger set of 20 relevant chunks, then rerank them with arerankers reranker, and finally keep the top 5 chunks.

RAGLite also offers more advanced control over the individual steps of a full RAG pipeline:

Searching for relevant chunks with keyword, vector, or hybrid search
Retrieving the chunks from the database
Reranking the chunks and selecting the top 5 results
Extending the chunks with their neighbors and grouping them into chunk spans
Converting the user prompt to a RAG instruction and appending it to the message history
Streaming an LLM response to the message history
Accessing the cited documents from the chunk spans

A full RAG pipeline is straightforward to implement with RAGLite:

# Search for chunksfromragliteimporthybrid_search,keyword_search,vector_searchuser_prompt="How is intelligence measured?"chunk_ids_vector,_=vector_search(user_prompt,num_results=20,config=my_config)chunk_ids_keyword,_=keyword_search(user_prompt,num_results=20,config=my_config)chunk_ids_hybrid,_=hybrid_search(user_prompt,num_results=20,config=my_config)# Retrieve chunksfromragliteimportretrieve_chunkschunks_hybrid=retrieve_chunks(chunk_ids_hybrid,config=my_config)# Rerank chunks and keep the top 5 (optional, but recommended)fromragliteimportrerank_chunkschunks_reranked=rerank_chunks(user_prompt,chunks_hybrid,config=my_config)chunks_reranked=chunks_reranked[:5]# Extend chunks with their neighbors and group them into chunk spansfromragliteimportretrieve_chunk_spanschunk_spans=retrieve_chunk_spans(chunks_reranked,config=my_config)# Append a RAG instruction based on the user prompt and context to the message historyfromragliteimportadd_contextmessages= []# Or start with an existing message historymessages.append(add_context(user_prompt=user_prompt,context=chunk_spans))# Stream the RAG response and append it to the message historyfromragliteimportragstream=rag(messages,config=my_config)forupdateinstream:print(update,end="")# Access the documents referenced in the RAG contextdocuments= [chunk_span.documentforchunk_spaninchunk_spans]

4. Computing and using an optimal query adapter

RAGLite can compute and apply anoptimal closed-form query adapter to the prompt embedding to improve the output quality of RAG. To benefit from this, first generate a set of evals withinsert_evals and then compute and store the optimal query adapter withupdate_query_adapter:

# Improve RAG with an optimal query adapterfromragliteimportinsert_evals,update_query_adapterinsert_evals(num_evals=100,config=my_config)update_query_adapter(config=my_config)# From here, every vector search will use the query adapter

5. Evaluation of retrieval and generation

If you installed theragas extra, you can use RAGLite to answer the evals and then evaluate the quality of both the retrieval and generation steps of RAG usingRagas:

# Evaluate retrieval and generationfromragliteimportanswer_evals,evaluate,insert_evalsinsert_evals(num_evals=100,config=my_config)answered_evals_df=answer_evals(num_evals=10,config=my_config)evaluation_df=evaluate(answered_evals_df,config=my_config)

6. Running a Model Context Protocol (MCP) server

RAGLite comes with anMCP server implemented withFastMCP that exposes asearch_knowledge_basetool. To use the server:

InstallClaude desktop
Installuv so that Claude desktop can start the server
Configure Claude desktop to useuv to start the MCP server with:

raglite \    --db-url duckdb:///raglite.db \    --llm llama-cpp-python/unsloth/Qwen3-4B-GGUF/*Q4_K_M.gguf@8192 \    --embedder llama-cpp-python/lm-kit/bge-m3-gguf/*F16.gguf@512 \    mcp install

To use an API-based LLM, make sure to include your credentials in a.env file or supply them inline:

export OPENAI_API_KEY=sk-...raglite \    --llm gpt-4o-mini \    --embedder text-embedding-3-large \    mcp install

Now, when you start Claude desktop you should see a 🔨 icon at the bottom right of your prompt indicating that the Claude has successfully connected with the MCP server.

When relevant, Claude will suggest to use thesearch_knowledge_base tool that the MCP server provides. You can also explicitly ask Claude to search the knowledge base if you want to be certain that it does.

raglite_mcp.mov

7. Serving a customizable ChatGPT-like frontend

If you installed thechainlit extra, you can serve a customizable ChatGPT-like frontend with:

raglite chainlit

The application is also deployable toweb,Slack, andTeams.

You can specify the database URL, LLM, and embedder directly in the Chainlit frontend, or with the CLI as follows:

raglite \    --db-url duckdb:///raglite.db \    --llm llama-cpp-python/unsloth/Qwen3-4B-GGUF/*Q4_K_M.gguf@8192 \    --embedder llama-cpp-python/lm-kit/bge-m3-gguf/*F16.gguf@512 \    chainlit

To use an API-based LLM, make sure to include your credentials in a.env file or supply them inline:

OPENAI_API_KEY=sk-... raglite --llm gpt-4o-mini --embedder text-embedding-3-large chainlit

raglite_chainlit.mov

Contributing

Prerequisites

Generate an SSH key andadd the SSH key to your GitHub account.

Configure SSH to automatically load your SSH keys:

cat<<EOF >> ~/.ssh/configHost *  AddKeysToAgent yes  IgnoreUnknown UseKeychain  UseKeychain yes  ForwardAgent yesEOF

Install Docker Desktop.
Install VS Code andVS Code's Dev Containers extension. Alternatively, installPyCharm.
Optional: install aNerd Font such asFiraCode Nerd Font andconfigure VS Code orPyCharm to use it.

Development environments

The following development environments are supported:

⭐️GitHub Codespaces: click onOpen in GitHub Codespaces to start developing in your browser.
⭐️VS Code Dev Container (with container volume): click onOpen in Dev Containers to clone this repository in a container volume and create a Dev Container with VS Code.

⭐️uv: clone this repository and run the following from root of the repository:

# Create and install a virtual environmentuv sync --python 3.10 --all-extras# Activate the virtual environmentsource .venv/bin/activate# Install the pre-commit hookspre-commit install --install-hooks

VS Code Dev Container: clone this repository, open it with VS Code, and runCtrl/⌘ +⇧ +P →Dev Containers: Reopen in Container.
PyCharm Dev Container: clone this repository, open it with PyCharm,create a Dev Container with Mount Sources, andconfigure an existing Python interpreter at/opt/venv/bin/python.

Developing

This project follows theConventional Commits standard to automateSemantic Versioning andKeep A Changelog withCommitizen.
Runpoe from within the development environment to print a list ofPoe the Poet tasks available to run on this project.
Runuv add {package} from within the development environment to install a run time dependency and add it topyproject.toml anduv.lock. Add--dev to install a development dependency.
Runuv sync --upgrade from within the development environment to upgrade all dependencies to the latest versions allowed bypyproject.toml. Add--only-dev to upgrade the development dependencies only.
Runcz bump to bump the package's version, update theCHANGELOG.md, and create a git tag. Then push the changes and the git tag withgit push origin main --tags.

Star History

About

🥤 RAGLite is a Python toolkit for Retrieval-Augmented Generation (RAG) with DuckDB or PostgreSQL

Releases17

v1.0.0 Latest

Jun 11, 2025

+ 16 releases

Movatterモバイル変換

License

superlinear-ai/raglite

Folders and files

Latest commit

History

Repository files navigation

🥤 RAGLite

Features

Configurable

Fast and permissive

Unhobbled

Extensible

Installing

Using

Overview

1. Configuring RAGLite

2. Inserting documents

3. Retrieval-Augmented Generation (RAG)

3.1 Adaptive RAG

3.2 Programmable RAG

4. Computing and using an optimal query adapter

5. Evaluation of retrieval and generation

6. Running a Model Context Protocol (MCP) server

7. Serving a customizable ChatGPT-like frontend

Contributing

Star History

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Releases17

Uh oh!

Contributors9

Uh oh!

Languages