evaluation
Here are 1,358 public repositories matching this topic...
Language:All
Sort:Most stars
🤘 awesome-semantic-segmentation
- Updated
May 8, 2021
🪢 Open source LLM engineering platform: LLM Observability, metrics, evals, prompt management, playground, datasets. Integrates with OpenTelemetry, Langchain, OpenAI SDK, LiteLLM, and more. 🍊YC W23
- Updated
Mar 17, 2025 - TypeScript
Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.
- Updated
Mar 18, 2025 - TypeScript
OpenCompass is an LLM evaluation platform, supporting a wide range of models (Llama3, Mistral, InternLM2,GPT-4,LLaMa2, Qwen,GLM, Claude, etc) over 100+ datasets.
- Updated
Mar 17, 2025 - Python
AutoRAG: An Open-Source Framework for Retrieval-Augmented Generation (RAG) Evaluation & Optimization with AutoML-Style Automation
- Updated
Mar 3, 2025 - Python
Python package for the evaluation of odometry and SLAM
- Updated
Feb 18, 2025 - Python
🧊 Open source LLM observability platform. One line of code to monitor, evaluate, and experiment. YC W23 🍓
- Updated
Mar 17, 2025 - TypeScript
Building a modern functional compiler from first principles. (http://dev.stephendiehl.com/fun/)
- Updated
Jan 11, 2021 - Haskell
The easiest tool for fine-tuning LLM models, synthetic data generation, and collaborating on datasets.
- Updated
Mar 18, 2025 - Python
Klipse is a JavaScript plugin for embedding interactive code snippets in tech blogs.
- Updated
Oct 1, 2024 - HTML
SuperCLUE: 中文通用大模型综合性基准 | A Benchmark for Foundation Models in Chinese
- Updated
May 23, 2024
End-to-end Automatic Speech Recognition for Madarian and English in Tensorflow
- Updated
Mar 24, 2023 - Python
A unified evaluation framework for large language models
- Updated
Feb 11, 2025 - Python
An open-source visual programming environment for battle-testing prompts to LLMs.
- Updated
Mar 18, 2025 - TypeScript
UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.
- Updated
Aug 18, 2024 - Python
Accelerating the development of large multimodal models (LMMs) with one-click evaluation module - lmms-eval.
- Updated
Mar 16, 2025 - Python
🤗 Evaluate: A library for easily evaluating machine learning models and datasets.
- Updated
Jan 10, 2025 - Python
Open-source evaluation toolkit of large multi-modality models (LMMs), support 220+ LMMs, 80+ benchmarks
- Updated
Mar 18, 2025 - Python
Improve this page
Add a description, image, and links to theevaluation topic page so that developers can more easily learn about it.
Add this topic to your repo
To associate your repository with theevaluation topic, visit your repo's landing page and select "manage topics."