evaluation-metrics
Here are 662 public repositories matching this topic...
Language:All
Sort:Most stars
The LLM Evaluation Framework
- Updated
Dec 17, 2025 - Python
Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and CamelAI
- Updated
Oct 30, 2025 - Python
《大模型白盒子构建指南》:一个全手搓的Tiny-Universe
- Updated
Dec 2, 2025 - Jupyter Notebook
Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends
- Updated
Dec 15, 2025 - Python
Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!
- Updated
Dec 3, 2025 - Jupyter Notebook
(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"
- Updated
Apr 3, 2024 - Python
[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.
- Updated
Aug 12, 2024 - Jupyter Notebook
Evaluate your speech-to-text system with similarity measures such as word error rate (WER)
- Updated
Feb 15, 2025 - Python
OCTIS: Comparing Topic Models is Simple! A python package to optimize and evaluate topic models (accepted at EACL2021 demo track)
- Updated
Nov 24, 2025 - Python
A Neural Framework for MT Evaluation
- Updated
Sep 1, 2025 - Python
📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.
- Updated
Aug 31, 2024 - Python
⚡️A Blazing-Fast Python Library for Ranking Evaluation, Comparison, and Fusion 🐍
- Updated
Aug 7, 2025 - Python
Data-Driven Evaluation for LLM-Powered Applications
- Updated
Jan 22, 2025 - Python
PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Mor…
- Updated
Sep 14, 2023 - Python
[RAL' 25 & IROS‘ 25] MapEval: Towards Unified, Robust and Efficient SLAM Map Evaluation Framework.
- Updated
Jul 15, 2025 - C++
Benchmark diffusion models faster. Automate evals, seeds, and metrics for reproducible results.
- Updated
Oct 3, 2025 - Python
A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems
- Updated
Sep 22, 2025
Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)
- Updated
Jul 12, 2024 - Jupyter Notebook
RAG evaluation without the need for "golden answers"
- Updated
Dec 15, 2025 - Python
PySODEvalToolkit: A Python-based Evaluation Toolbox for Salient Object Detection and Camouflaged Object Detection
- Updated
Sep 27, 2024 - Python
Improve this page
Add a description, image, and links to theevaluation-metrics topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with theevaluation-metrics topic, visit your repo's landing page and select "manage topics."