rag-evaluation
Here are 66 public repositories matching this topic...
Language:All
Sort:Most stars
🐢 Open-Source Evaluation & Testing library for LLM Agents
- Updated
Feb 20, 2026 - Python
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
- Updated
Dec 23, 2025 - Python
The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.
- Updated
Feb 20, 2026 - TypeScript
- Updated
Sep 7, 2025
RAG evaluation without the need for "golden answers"
- Updated
Dec 15, 2025 - Python
Framework for testing vulnerabilities of large language models (LLM).
- Updated
Jan 16, 2026 - Python
RAG boilerplate with semantic/propositional chunking, hybrid search (BM25 + dense), LLM reranking, query enhancement agents, CrewAI orchestration, Qdrant vector search, Redis/Mongo sessioning, Celery ingestion pipeline, Gradio UI, and an evaluation suite (Hit-Rate, MRR, hybrid configs).
- Updated
Nov 18, 2025 - Python
- Updated
Apr 14, 2025 - Python
Open source framework for evaluating AI Agents
- Updated
Jan 23, 2026 - Python
⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or CLI. Privacy-first, async, visual reports.
- Updated
Feb 19, 2026 - Python
Evaluation Framework for LLM applications in Java and Kotlin
- Updated
Feb 13, 2026 - Java
smallevals — CPU-fast, GPU-blazing fast offline retrieval evaluation for RAG systems with tiny QA models.
- Updated
Dec 4, 2025 - Python
This project aims to compare different Retrieval-Augmented Generation (RAG) frameworks in terms of speed and performance.
- Updated
Jul 28, 2024 - Python
A framework for systematic evaluation of retrieval strategies and prompt engineering in RAG systems, featuring an interactive chat interface for document analysis.
- Updated
Dec 18, 2024 - Python
Learn Retrieval-Augmented Generation (RAG) from Scratch using LLMs from Hugging Face and Langchain or Python
- Updated
Jan 20, 2025 - Jupyter Notebook
RAG Chatbot for Financial Analysis
- Updated
Mar 9, 2025 - Python
EvalWise is a developer-friendly platform for LLM evaluation and red teaming that helps test AI models for safety, compliance, and performance issues
- Updated
Nov 20, 2025 - Python
A modular, multi-model AI assistant UI built on .NET 9, featuring RAG, extensible tools, and deep code + database knowledge through semantic search.
- Updated
Dec 10, 2025 - HTML
A comprehensive evaluation toolkit for assessing Retrieval-Augmented Generation (RAG) outputs using linguistic, semantic, and fairness metrics
- Updated
Apr 19, 2025 - Python
EntRAG - Enterprise RAG Benchmark
- Updated
Jun 10, 2025 - Python
Improve this page
Add a description, image, and links to therag-evaluation topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with therag-evaluation topic, visit your repo's landing page and select "manage topics."