Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
#

ai-evaluation

Here are 14 public repositories matching this topic...

Hallucinations (Confabulations) Document-Based Benchmark for RAG. Includes human-verified questions and answers.

  • UpdatedMar 20, 2025
  • HTML

Ranking LLMs on agentic tasks

  • UpdatedMar 12, 2025
  • Jupyter Notebook

Vivaria is METR's tool for running evaluations and conducting agent elicitation research.

  • UpdatedMar 22, 2025
  • TypeScript

one click to open multi AI sites | 一键打开多个 AI 站点,查看 AI 结果

  • UpdatedJan 21, 2025

Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation metrics.

  • UpdatedMar 20, 2025

Adaptive Testing Framework for AI Models (Psychometrics in AI Evaluation)

  • UpdatedOct 1, 2024
  • Jupyter Notebook

ROC methodology explained with R-examples

  • UpdatedApr 25, 2024
  • TeX

FROC methodology explained with R-examples

  • UpdatedDec 26, 2023
  • TeX

RJafroc quick start for those already familiar with windows jafroc

  • UpdatedDec 28, 2023
  • TeX

Installation files for Windows JAFROC software

  • UpdatedFeb 8, 2023

Repository for the LWDA'24 presentation on 'Psychometric Profiling of GPT Models for Bias Exploration', featuring conference materials including the poster, paper, slides, and references.

  • UpdatedSep 23, 2024
  • TeX

ROC/FROC datasets from my collaborations

  • UpdatedAug 14, 2023

Improve this page

Add a description, image, and links to theai-evaluation topic page so that developers can more easily learn about it.

Curate this topic

Add this topic to your repo

To associate your repository with theai-evaluation topic, visit your repo's landing page and select "manage topics."

Learn more


[8]ページ先頭

©2009-2025 Movatter.jp