Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
#

llm-eval

Here are 37 public repositories matching this topic...

Test your prompts, agents, and RAGs. Red teaming, pentesting, and vulnerability scanning for LLMs. Compare performance of GPT, Claude, Gemini, Llama, and more. Simple declarative configs with command line and CI/CD integration.

  • UpdatedMar 17, 2025
  • TypeScript
phoenixgiskarddatachain

ETL, Analytics, Versioning for Unstructured Data

  • UpdatedMar 17, 2025
  • Python

UpTrain is an open-source unified platform to evaluate and improve Generative AI applications. We provide grades for 20+ preconfigured checks (covering language, code, embedding use-cases), perform root cause analysis on failure cases and give insights on how to resolve them.

  • UpdatedAug 18, 2024
  • Python

Python SDK for running evaluations on LLM generated responses

  • UpdatedMar 17, 2025
  • Python

Generate ideal question-answers for testing RAG

  • UpdatedFeb 25, 2025
  • Python

A simple GPT-based evaluation tool for multi-aspect, interpretable assessment of LLMs.

  • UpdatedJan 29, 2024
  • Python

Python SDK for experimenting, testing, evaluating & monitoring LLM-powered applications - Parea AI (YC S23)

  • UpdatedFeb 13, 2025
  • Python

Бенчмарк сравнивает русские аналоги ChatGPT: Saiga, YandexGPT, Gigachat

  • UpdatedSep 26, 2023
  • Jupyter Notebook
ragrank

🎯 Your free LLM evaluation toolkit helps you assess the accuracy of facts, how well it understands context, its tone, and more. This helps you see how good your LLM applications are.

  • UpdatedJan 7, 2025
  • Python

This is an opensource project allowing you to compare two LLM's head to head with a given prompt, this section will be regarding the backend of this project, allowing for llm api's to be incorporated and used in the front-end

  • UpdatedMar 7, 2025
  • Python

Realign is a testing and simulation framework for AI applications.

  • UpdatedDec 4, 2024
  • Python

Code for "Prediction-Powered Ranking of Large Language Models", NeurIPS 2024.

  • UpdatedOct 28, 2024
  • Jupyter Notebook

Create an evaluation framework for your LLM based app. Incorporate it into your test suite. Lay the monitoring foundation.

  • UpdatedJan 14, 2025
  • Jupyter Notebook

The prompt engineering, prompt management, and prompt evaluation tool for Python

  • UpdatedSep 17, 2024
  • Python

The prompt engineering, prompt management, and prompt evaluation tool for TypeScript, JavaScript, and NodeJS.

  • UpdatedSep 14, 2024
  • TypeScript

Improve this page

Add a description, image, and links to thellm-eval topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with thellm-eval topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp