llm-eval
Here are 37 public repositories matching this topic...
Language:All
Sort:Most stars
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
- Updated
Mar 17, 2025 - TypeScript
AI Observability & Evaluation
- Updated
Mar 17, 2025 - Jupyter Notebook
🐢 Open-Source Evaluation & Testing for AI & LLM systems
- Updated
Mar 10, 2025 - Python
ETL, Analytics, Versioning for Unstructured Data
- Updated
Mar 17, 2025 - Python
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
- Updated
Aug 18, 2024 - Python
Python SDK for running evaluations on LLM generated responses
- Updated
Mar 17, 2025 - Python
Generate ideal question-answers for testing RAG
- Updated
Feb 25, 2025 - Python
A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.
- Updated
Jan 29, 2024 - Python
Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)
- Updated
Feb 13, 2025 - Python
Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat
- Updated
Sep 26, 2023 - Jupyter Notebook
🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.
- Updated
Jan 7, 2025 - Python
Develop reliable AI apps
- Updated
Mar 12, 2025 - Svelte
An open source library for asynchronous querying of LLM endpoints
- Updated
Mar 3, 2025 - Python
Realign is a testing and simulation framework for AI applications.
- Updated
Dec 4, 2024 - Python
Code for "Prediction-Powered Ranking of Large Language Models", NeurIPS 2024.
- Updated
Oct 28, 2024 - Jupyter Notebook
Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.
- Updated
Jan 14, 2025 - Jupyter Notebook
The prompt engineering, prompt management, and prompt evaluation tool for Python
- Updated
Sep 17, 2024 - Python
The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.
- Updated
Sep 14, 2024 - TypeScript
Improve this page
Add a description, image, and links to thellm-eval topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with thellm-eval topic, visit your repo's landing page and select "manage topics."