Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
#

evaluation

Here are 1,358 public repositories matching this topic...

🤘 awesome-semantic-segmentation

  • UpdatedMay 8, 2021

🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23

  • UpdatedMar 17, 2025
  • TypeScript

Supercharge Your LLM Application Evaluations 🚀

  • UpdatedMar 15, 2025
  • Python

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

  • UpdatedMar 18, 2025
  • TypeScript

OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.

  • UpdatedMar 17, 2025
  • Python

Arbitrary expression evaluation for golang

  • UpdatedMay 31, 2024
  • Go
AutoRAG

AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation

  • UpdatedMar 3, 2025
  • Python
evo

🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓

  • UpdatedMar 17, 2025
  • TypeScript
Kiln

The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.

  • UpdatedMar 18, 2025
  • Python

SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese

  • UpdatedMay 23, 2024

A unified evaluation framework for large language models

  • UpdatedFeb 11, 2025
  • Python

An open-source visual programming environment for battle-testing prompts to LLMs.

  • UpdatedMar 18, 2025
  • TypeScript

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

  • UpdatedAug 18, 2024
  • Python

Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.

  • UpdatedMar 16, 2025
  • Python

🤗 Evaluate: A library for easily evaluating machine learning models and datasets.

  • UpdatedJan 10, 2025
  • Python

Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks

  • UpdatedMar 18, 2025
  • Python

Improve this page

Add a description, image, and links to theevaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with theevaluation topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp