SWE-agent
SWE-agent takes a GitHub issue and tries to automatically fix it, using your LM of choice. It can also be employed for offensive cybersecurity or competitive coding challenges. [NeurIPS 2024]
Star Growth
Overview
SWE-agent是一个由普林斯顿大学和斯坦福大学研究人员开发的开源工具,能够使用大语言模型(如GPT-4o、Claude Sonnet 4)自主修复GitHub仓库中的实际问题。该工具在SWE-bench基准测试中达到了开源项目的最先进水平,通过单个YAML文件进行配置,为语言模型提供了最大的自主权来使用各种工具解决编程问题。SWE-agent不仅可以自动修复代码问题,还能用于网络安全漏洞发现和竞赛编程挑战。该项目专为研究设计,架构简单且易于修改。值得注意的是,开发团队目前主要专注于mini-swe-agent项目,这是一个更简洁的继任者,在保持相同性能的同时大大简化了实现。SWE-agent展示了AI在软件工程自动化方面的潜力,特别是在代码修复和漏洞检测领域,为研究人员和开发者提供了一个强大的工具来探索自主编程代理的能力边界。
Deep Analysis
Princeton/Stanford research project achieving SoTA on SWE-bench — the most rigorous benchmark for automated software engineering — with a simple, hackable design that leaves maximal agency to the LLM
⚡ Capabilities
- • Autonomous software engineering agent for GitHub issues
- • State-of-the-art on SWE-bench benchmarks
- • Configurable via single YAML file
- • Offensive cybersecurity (CTF) capabilities (EnIGMA)
- • Custom task support beyond code fixing
- • Support for multiple LLMs (GPT-4o, Claude, etc.)
🔗 Integrations
✓ Best For
- ✓ Automated bug fixing and issue resolution in GitHub repos
- ✓ Research on AI-driven software engineering capabilities
✗ Not Ideal For
- ✗ General-purpose AI assistants (not designed for chat)
- ✗ Teams wanting actively maintained tooling (consider mini-SWE-agent)
Languages
Deployment
Pricing Detail
⚠ Known Limitations
- ⚠ Now in maintenance mode — mini-SWE-agent is the successor
- ⚠ Requires powerful LLM (GPT-4o/Claude) for good results — API costs can be significant
- ⚠ Limited to code-related tasks — not a general-purpose agent
- ⚠ Complex setup for custom environments
Pros
- + 在SWE-bench基准测试中达到开源项目的最先进性能水平
- + 支持多种主流大语言模型(GPT-4o、Claude Sonnet 4等),配置灵活
- + 专为研究设计,架构简单且文档完善,易于定制和扩展
Cons
- - 开发重心已转移到mini-swe-agent项目,原项目维护可能受到影响
- - 主要面向研究用途,生产环境的稳定性和可靠性可能不如商业解决方案
Use Cases
- • 自动修复GitHub仓库中的代码问题和bug
- • 网络安全领域的漏洞发现和渗透测试
- • 竞赛编程和算法挑战的自动化解决