Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

evaluation-metrics

Here are 662 public repositories matching this topic...

Python SDK for AI agent monitoring, LLM cost tracking, benchmarking, and more. Integrates with most LLMs and agent frameworks including CrewAI, Agno, OpenAI Agents SDK, Langchain, Autogen, AG2, and CamelAI

  • UpdatedOct 30, 2025
  • Python

《大模型白盒子构建指南》:一个全手搓的Tiny-Universe

  • UpdatedDec 2, 2025
  • Jupyter Notebook

Lighteval is your all-in-one toolkit for evaluating LLMs across multiple backends

  • UpdatedDec 15, 2025
  • Python

Sharing both practical insights and theoretical knowledge about LLM evaluation that we gathered while managing the Open LLM Leaderboard and designing lighteval!

  • UpdatedDec 3, 2025
  • Jupyter Notebook

(IROS 2020, ECCVW 2020) Official Python Implementation for "3D Multi-Object Tracking: A Baseline and New Evaluation Metrics"

  • UpdatedApr 3, 2024
  • Python

[NeurIPS'21 Outstanding Paper] Library for reliable evaluation on RL and ML benchmarks, even with only a handful of seeds.

  • UpdatedAug 12, 2024
  • Jupyter Notebook

Evaluate your speech-to-text system with similarity measures such as word error rate (WER)

  • UpdatedFeb 15, 2025
  • Python
OCTISCOMET

📈 Implementation of eight evaluation metrics to access the similarity between two images. The eight metrics are as follows: RMSE, PSNR, SSIM, ISSM, FSIM, SRE, SAM, and UIQ.

  • UpdatedAug 31, 2024
  • Python
ranx

Data-Driven Evaluation for LLM-Powered Applications

  • UpdatedJan 22, 2025
  • Python

PyNLPl, pronounced as 'pineapple', is a Python library for Natural Language Processing. It contains various modules useful for common, and less common, NLP tasks. PyNLPl can be used for basic tasks such as the extraction of n-grams and frequency lists, and to build simple language model. There are also more complex data types and algorithms. Mor…

  • UpdatedSep 14, 2023
  • Python

[RAL' 25 & IROS‘ 25] MapEval: Towards Unified, Robust and Efficient SLAM Map Evaluation Framework.

  • UpdatedJul 15, 2025
  • C++

Benchmark diffusion models faster. Automate evals, seeds, and metrics for reproducible results.

  • UpdatedOct 3, 2025
  • Python

A list of works on evaluation of visual generation models, including evaluation metrics, models, and systems

  • UpdatedSep 22, 2025

Source code for "Taming Visually Guided Sound Generation" (Oral at the BMVC 2021)

  • UpdatedJul 12, 2024
  • Jupyter Notebook

RAG evaluation without the need for "golden answers"

  • UpdatedDec 15, 2025
  • Python

Improve this page

Add a description, image, and links to theevaluation-metrics topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with theevaluation-metrics topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp