Movatterモバイル変換

README.md

📣 New benchmark:CodeClash (website,github) evaluates SWE agents on goals, not tasks
📣 New: Meetmini, the 100 line AI agent that still gets 65% on SWE-bench verified!

Software engineering agents, benchmarks, and models.

Built and maintained by researchers from Stanford University and Princeton University.

This organization contains the source code for several projects in the SWE-* open source ecosystem, including:

SWE-bench, a benchmark for evaluating AI systems on real world GitHub issues.
SWE-agent, a system that automatically solves GitHub issues using an LM agent.
SWE-smith, a toolkit for generating SWE training data at scale.
mini, an AI agent written in just 100 lines of code that scores >70% on SWE-bench verified

Also check out the supporting infrastructure for working with SWE-* projects

SWE-ReX, infrastructure supporting sandboxed code execution for AI agents
sb-cli, a command line interface for running evaluations on the cloud.
Mirror clones for the SWE-bench and SWE-smith repositories are availablehere andhere.

PinnedLoading

SWE-benchSWE-benchPublic
SWE-bench: Can Language Models Resolve Real-world Github Issues?
Python 4k 714
SWE-smithSWE-smithPublic
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
Python 488 90
experimentsexperimentsPublic
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
Shell 228 277
sb-clisb-cliPublic
Run SWE-bench evaluations remotely
Python 46 5

Showing 9 of 9 repositories

SWE-bench Public
SWE-bench: Can Language Models Resolve Real-world Github Issues?
SWE-bench/SWE-bench’s past year of commit activity
Python 3,976MIT 714 50 21 UpdatedDec 18, 2025
swe-bench.github.io Public
Landing page + leaderboard for SWE-Bench benchmark
SWE-bench/swe-bench.github.io’s past year of commit activity
JavaScript 8 12 5 0 UpdatedDec 18, 2025
SWE-smith Public
[NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents
SWE-bench/SWE-smith’s past year of commit activity
Python 488MIT 90 12 (2 issues need help) 10 UpdatedDec 18, 2025
SWE-smith-envs Public
Artifacts for building environments (Docker images) for repositories represented in SWE-smith
SWE-bench/SWE-smith-envs’s past year of commit activity
Dockerfile 4 1 0 0 UpdatedDec 17, 2025
experiments Public
Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.
SWE-bench/experiments’s past year of commit activity
Shell 228 277 7 29 UpdatedDec 15, 2025
reading-list Public
Academic papers and works related to SWE-bench and SWE-agents
SWE-bench/reading-list’s past year of commit activity
8 3 0 0 UpdatedDec 8, 2025
.github Public
SWE-bench/.github’s past year of commit activity
0MIT0 0 0 UpdatedNov 14, 2025
sb-cli Public
Run SWE-bench evaluations remotely
SWE-bench/sb-cli’s past year of commit activity
Python 46MIT 5 10 0 UpdatedAug 14, 2025
humanevalfix-results Public archive
Evaluation data + results for SWE-agent inference on HumanEvalFix task
SWE-bench/humanevalfix-results’s past year of commit activity
Jupyter Notebook 10 0 0 UpdatedJul 11, 2024