ai-evaluation

Benchmark evaluating LLMs on their ability to create and resist disinformation. Includes comprehensive testing across major models (Claude, GPT-4, Gemini, Llama, etc.) with standardized evaluation metrics.

nlp machine-learning gemini llama language-model model-evaluation ai-safety mistral claude disinformation ai-security ai-benchmarks ai-evaluation llm llm-benchmarking gpt4o

UpdatedMar 20, 2025

kereva-dev /kereva-scanner

Star15

Code scanner to check for issues in prompts and LLM calls

cli security ai linter evaluation code-scanning red-teaming ai-security hallucination ai-evaluation llm prompt-injection llm-security ai-code-review llm-evaluation owasp-llm-top-10 ai-performance ai-red-teaming llm-performance

UpdatedMar 22, 2025
Python

aloth /JudgeGPT

Star4

JudgeGPT - (Fake) News Evaluation, a research project

nlp machine-learning mongodb survey research-project fake-news survey-app fake-news-challenge crowdsource-experiments misinformation explainable-ai ai-ethics streamlit streamlit-webapp fake-news-analysis human-ai-interaction ai-evaluation generative-ai

UpdatedFeb 25, 2025
Python

bigdata-ustc /CAT4AI

Star4

Adaptive Testing Framework for AI Models (Psychometrics in AI Evaluation)

psychometrics adaptive-testing ai-evaluation

UpdatedOct 1, 2024
Jupyter Notebook

dpc10ster /RJafrocRocBook

Star3

ROC methodology explained with R-examples

book roc ai-evaluation

UpdatedApr 25, 2024
TeX

dpc10ster /RJafrocFrocBook

Star3

FROC methodology explained with R-examples

pdf r book ai-evaluation

UpdatedDec 26, 2023
TeX

dpc10ster /RJafrocQuickStart

Star2

RJafroc quick start for those already familiar with windows jafroc

r rjafroc ai-evaluation

UpdatedDec 28, 2023
TeX

dpc10ster /WindowsJAFROC

Star1

Installation files for Windows JAFROC software

windows ai-evaluation jafroc

UpdatedFeb 8, 2023

gabrielhamalwa /magpie

Star0

Repository for the LWDA'24 presentation on 'Psychometric Profiling of GPT Models for Bias Exploration', featuring conference materials including the poster, paper, slides, and references.

ai-safety personality-traits interpretability cognitive-bias explainability ai-evaluation gpt-models machine-psychology ai-bias psychometric-analysis lwda24