Movatterモバイル変換

Skip to content

#

rag-evaluation

Here are 66 public repositories matching this topic...

Language:All

Filter by language

All66 Python47 Jupyter Notebook11 TypeScript3 HTML1 Java1 Shell1

Sort:Most stars

Sort options

Most stars Fewest stars Most forks Fewest forks Recently updated Least recently updated

giskard-oss

Giskard-AI /giskard-oss

🐢 Open-Source Evaluation & Testing library for LLM Agents

ai-security mlops fairness-ai responsible-ai ml-validation red-team-tools trustworthy-ai ml-testing llm ai-red-team ai-testing llmops llm-security llm-eval llm-evaluation rag-evaluation agent-evaluation

UpdatedFeb 20, 2026
Python

AutoRAG

Marker-Inc-Korea /AutoRAG

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

python open-source qa benchmarking ops pipeline analysis optimization evaluation embeddings automl document-parser rag llm retrieval-augmented-generation llm-ops llm-evaluation rag-evaluation

UpdatedDec 23, 2025
Python

agenta

Agenta-AI /agenta

The open-source LLMOps platform: prompt playground, prompt management, LLM evaluation, and LLM observability all in one place.

evaluation agents observability prompt-engineering llmops prompt-management llm-tools llm-framework llm-playground llm-platform llm-evaluation rag-evaluation llm-monitoring llm-as-a-judge llm-observability

UpdatedFeb 20, 2026
TypeScript

frutik /Awesome-RAG

rag rag-implementation rag-evaluation

UpdatedSep 7, 2025

vectara /open-rag-eval

RAG evaluation without the need for "golden answers"

metrics evaluation-metrics rag vectara retrieval-augmented-generation rag-evaluation

UpdatedDec 15, 2025
Python

llamator

LLAMATOR-Core /llamator

Framework for testing vulnerabilities of large language models (LLM).

python nlp agent ai attack jailbreak owasp vulnerability red-team security-tools misinformation ai-security rag red-team-tools hallucinations llm llm-security rag-evaluation llm-testing llm-read-team

UpdatedJan 16, 2026
Python

mburaksayici /RAG-Boilerplate

RAG boilerplate with semantic/propositional chunking, hybrid search (BM25 + dense), LLM reranking, query enhancement agents, CrewAI orchestration, Qdrant vector search, Redis/Mongo sessioning, Celery ingestion pipeline, Gradio UI, and an evaluation suite (Hit-Rate, MRR, hybrid configs).

ai-agents reranking rag vector-database hybrid-search qdrant llm retrieval-augmented-generation rag-evaluation semantic-chunking crewai rag-pipeline propositional-models query-enhancement

UpdatedNov 18, 2025
Python

mts-ai /rurage

information-retrieval question-answering rag llm-evaluation rag-evaluation

UpdatedApr 14, 2025
Python

vero-labs-ai /vero-eval

Open source framework for evaluating AI Agents

python testing evaluation datasets dataset-generation evaluation-metrics evaluation-framework testing-framework testing-library synthetic-dataset-generation user-persona evals llm-evaluation rag-evaluation llm-evaluation-framework langgraph rag-testing

UpdatedJan 23, 2026
Python

HZYAI /RagScore

⚡️ The "1-Minute RAG Audit" — Generate QA datasets & evaluate RAG systems in Colab, Jupyter, or CLI. Privacy-first, async, visual reports.

privacy jupyter mcp evaluation colab dataset-generation synthetic-data fine-tuning rag qa-generation ai-evaluation llm llmops local-llm ollama rag-evaluation llm-as-a-judge

UpdatedFeb 19, 2026
Python

dokimos-dev /dokimos

Evaluation Framework for LLM applications in Java and Kotlin

kotlin java junit evaluation-metrics evaluation-framework junit-extension rag llm retrieval-augmented-generation langchain4j llm-evaluation rag-evaluation spring-ai llm-evaluation-framework llm-evaluation-metrics agentic-ai agent-evaluation koog spring-ai-evaluation

UpdatedFeb 13, 2026
Java

mburaksayici /smallevals

smallevals — CPU-fast, GPU-blazing fast offline retrieval evaluation for RAG systems with tiny QA models.

qa chroma question-generation weaviate qa-generation milvus vector-database qdrant chromadb rag-evaluation tiny-llm retrieval-evaluation offline-evaluation retrieval-metrics

UpdatedDec 4, 2025
Python

oztrkoguz /RAG-Framework-Evaluation

This project aims to compare different Retrieval-Augmented Generation (RAG) frameworks in terms of speed and performance.

swarms autogen rag langchain llamaindex rag-evaluation crewai langchain-rag autogen-rag crewai-rag llamaindex-rag swarms-rag

UpdatedJul 28, 2024
Python

ioannis-papadimitriou /rag-playground

A framework for systematic evaluation of retrieval strategies and prompt engineering in RAG systems, featuring an interactive chat interface for document analysis.

chatbot qa-generation llm-inference retrieval-augmented-generation rag-evaluation

UpdatedDec 18, 2024
Python

simranjeet97 /Learn_RAG_from_Scratch_LLM

Learn Retrieval-Augmented Generation (RAG) from Scratch using LLMs from Hugging Face and Langchain or Python

artificial-intelligence rag datascience-machinelearning generative-ai llm-training retrieval-augmented-generation rag-model llm-framework llm-apps llm-evaluation genai-usecase rag-implementation rag-evaluation rag-embeddings rag-pipeline rag-llm rag-chatbot rag-application genai-domain

UpdatedJan 20, 2025
Jupyter Notebook

rostyslavshovak /RAG-Retrieval-Augmented-Generation

RAG Chatbot for Financial Analysis

open-source pdf rag gradio-interface langchain qdrant-vector-database retrieval-augmented-generation rag-evaluation

UpdatedMar 9, 2025
Python

bluewave-labs /evalwise

EvalWise is a developer-friendly platform for LLM evaluation and red teaming that helps test AI models for safety, compliance, and performance issues

rag llm prompt-engineering llmops prompt-testing evals llm-evaluation rag-evaluation llm-evaluation-toolkit

UpdatedNov 20, 2025
Python

Assistant-Engine /AssistantEngine

A modular, multi-model AI assistant UI built on .NET 9, featuring RAG, extensible tools, and deep code + database knowledge through semantic search.

ai local assistant-chat-bots ai-agents rag maui-blazor local-ai ollama-gui ollama-app rag-evaluation mcp-client

UpdatedDec 10, 2025
HTML

shaadclt /EvalRAG

A comprehensive evaluation toolkit for assessing Retrieval-Augmented Generation (RAG) outputs using linguistic, semantic, and fairness metrics

rag rag-evaluation

UpdatedApr 19, 2025
Python

fkapsahili /EntRAG

EntRAG - Enterprise RAG Benchmark

benchmark evaluations retrieval evaluation dataset knowledge-graph rag llm generative-ai retrieval-augmented-generation llm-evaluation rag-evaluation

UpdatedJun 10, 2025
Python

Improve this page

Add a description, image, and links to therag-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with therag-evaluation topic, visit your repo's landing page and select "manage topics."

[8]ページ先頭

©2009-2026 Movatter.jp