Movatterモバイル変換


[0]ホーム

URL:


Skip to content

Navigation Menu

Sign in
Appearance settings
SWE-bench

Search code, repositories, users, issues, pull requests...

Provide feedback

We read every piece of feedback, and take your input very seriously.

Saved searches

Use saved searches to filter your results more quickly

Sign up
Appearance settings
@SWE-bench

SWE-bench

Organization for maintaining SWE-bench and related projects

📣 New benchmark:CodeClash (website,github) evaluates SWE agents on goals, not tasks
📣 New: Meetmini, the 100 line AI agent that still gets 65% on SWE-bench verified!

SWE-bench  SWE-agent  codeclash logo  SWE-smith  mini-SWE-agent  SWE-ReX  sb-cli

Software engineering agents, benchmarks, and models.
Built and maintained by researchers from Stanford University and Princeton University.

HuggingFaceSlackYouTube


This organization contains the source code for several projects in the SWE-* open source ecosystem, including:

  • SWE-bench, a benchmark for evaluating AI systems on real world GitHub issues.
  • SWE-agent, a system that automatically solves GitHub issues using an LM agent.
  • SWE-smith, a toolkit for generating SWE training data at scale.
  • mini, an AI agent written in just 100 lines of code that scores >70% on SWE-bench verified

Also check out the supporting infrastructure for working with SWE-* projects

  • SWE-ReX, infrastructure supporting sandboxed code execution for AI agents
  • sb-cli, a command line interface for running evaluations on the cloud.
  • Mirror clones for the SWE-bench and SWE-smith repositories are availablehere andhere.

PinnedLoading

  1. SWE-benchSWE-benchPublic

    SWE-bench: Can Language Models Resolve Real-world Github Issues?

    Python 4k 714

  2. SWE-smithSWE-smithPublic

    [NeurIPS 2025 D&B Spotlight] Scaling Data for SWE-agents

    Python 488 90

  3. experimentsexperimentsPublic

    Open sourced predictions, execution logs, trajectories, and results from model inference + evaluation runs on the SWE-bench task.

    Shell 228 277

  4. sb-clisb-cliPublic

    Run SWE-bench evaluations remotely

    Python 46 5

Repositories

Loading
Type
Select type
Language
Select language
Sort
Select order
Showing 9 of 9 repositories

Top languages

Loading…


[8]ページ先頭

©2009-2025 Movatter.jp