Model Evaluation and Threat Research (METR)

METR is a research nonprofit that works on assessing whether cutting-edge AI systems could pose catastrophic risks to society.

We build the science of accurately assessing risks, so that humanity is informed before developing transformative AI systems.

Our Software

Vivaria
Public Task Suite
RE-Bench Task Suite
Some of our open-source agents can be found atgithub.com/poking-agents

Popular repositoriesLoading

task-standardtask-standardPublic
METR Task Standard
TypeScript 146 32
public-taskspublic-tasksPublic
HTML 87 9
vivariavivariaPublic
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
TypeScript 85 31
RE-BenchRE-BenchPublic
Python 66 6
eval-analysis-publiceval-analysis-publicPublic
Public repository containing METR's DVC pipeline for eval data analysis
Python 30 5
task-templatetask-templatePublic template
TypeScript 9 6

Showing 10 of 28 repositories

eval-analysis-public Public
Public repository containing METR's DVC pipeline for eval data analysis
METR/eval-analysis-public’s past year of commit activity
Python 30 5 5 1 UpdatedMar 24, 2025
vivaria Public
Vivaria is METR's tool for running evaluations and conducting agent elicitation research.
METR/vivaria’s past year of commit activity
TypeScript 85MIT 31 229 (3 issues need help) 17 UpdatedMar 24, 2025
inspect_k8s_sandbox Public Forked fromUKGovernmentBEIS/inspect_k8s_sandbox
A Kubernetes sandbox environment for use with inspect_ai
METR/inspect_k8s_sandbox’s past year of commit activity
Python0MIT 5 0 1 UpdatedMar 21, 2025
autonomy-evals-guide Public
METR/autonomy-evals-guide’s past year of commit activity
SCSS 3MIT 4 0 2 UpdatedMar 20, 2025
inspect_evals Public Forked fromUKGovernmentBEIS/inspect_evals
Collection of evals for Inspect AI
METR/inspect_evals’s past year of commit activity
Python0MIT 103 0 0 UpdatedMar 19, 2025
task-protected-scoring Public
METR/task-protected-scoring’s past year of commit activity
Python0 1 3 2 UpdatedMar 18, 2025
uplift_clone_hypothesis Public Forked fromHypothesisWorks/hypothesis
Hypothesis is a powerful, flexible, and easy to use library for property-based testing.
METR/uplift_clone_hypothesis’s past year of commit activity
Python0 628 3 1 UpdatedMar 18, 2025
hcast-public Public
METR/hcast-public’s past year of commit activity
HTML 8 1 0 0 UpdatedMar 17, 2025
public-tasks Public
METR/public-tasks’s past year of commit activity
HTML 87 9 1 2 UpdatedMar 16, 2025
task-assets Public
METR/task-assets’s past year of commit activity
Python00 1 0 UpdatedMar 8, 2025

View all repositories

People

This organization has no public members. You must be a member to see who’s a part of this organization.

Top languages

Loading…

Most used topics

Loading…

You can’t perform that action at this time.

Movatterモバイル変換

Navigation Menu

Search code, repositories, users, issues, pull requests...

Provide feedback

Saved searches

Use saved searches to filter your results more quickly