Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
#

evaluation

Here are 1,801 public repositories matching this topic...

mlflow

The open source developer platform to build AI agents and models with confidence. Enhance your AI applications with end-to-end tracking, observability, and evaluations, all in one integrated platform.

  • UpdatedDec 18, 2025
  • Python
langfuse

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

  • UpdatedDec 17, 2025
  • TypeScript

Debug, evaluate, and monitor your LLM applications, RAG systems, and agentic workflows with comprehensive tracing, automated evaluations, and production-ready dashboards.

  • UpdatedDec 18, 2025
  • Python

Supercharge Your LLM Application Evaluations 🚀

  • UpdatedDec 17, 2025
  • Python

🤘 awesome-semantic-segmentation

  • UpdatedMay 8, 2021

Test your prompts, agents, and RAGs. AI Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

  • UpdatedDec 18, 2025
  • TypeScript

LLM-powered framework for deep document understanding, semantic retrieval, and context-aware answers using RAG paradigm.

  • UpdatedDec 17, 2025
  • Go

Easily fine-tune, evaluate and deploy gpt-oss, Qwen3, DeepSeek-R1, or any open source LLM / VLM!

  • UpdatedDec 18, 2025
  • Python

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

  • UpdatedDec 17, 2025
  • Python

Next-generation AI Agent Optimization Platform: Cozeloop addresses challenges in AI agent development by providing full-lifecycle management capabilities from development, debugging, and evaluation to monitoring.

  • UpdatedDec 18, 2025
  • Go

🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

  • UpdatedDec 18, 2025
  • TypeScript
KilnAutoRAG

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

  • UpdatedNov 20, 2025
  • Python
evo

Arbitrary expression evaluation for golang

  • UpdatedMar 25, 2025
  • Go

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

  • UpdatedDec 17, 2025
  • Python

One-for-All Multimodal Evaluation Toolkit Across Text, Image, Video, and Audio Tasks

  • UpdatedDec 18, 2025
  • Python

SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

  • UpdatedSep 8, 2025

Improve this page

Add a description, image, and links to theevaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with theevaluation topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp