Naman Jain*,1, Jaskirat Singh*,2, Manish Shetty1, Liang Zheng2, Koushik Sen1, Ion Stoica1
1UC Berkeley, 2ANU *Equal contribution, ^Equal supervision
💻 Code • 📃 Paper • 🤗 Data & Models • 🌐 Project Page
We present R2E-Gym, the largest procedurally curated environment for training real-world SWE-Agents. We show that R2E-Gym enables more scalable train and test-time scaling, achieving 51% on the SWE-Bench Verified benchmark, reflecting a new state-of-the-art for open-weight SWE-Agents and for first time being competitive with proprietary models such as o1 and sonnet-3.5-v2 with tools.
R2E-Gym is powered by two main contributions: (a) SWE-GEN: a synthetic data curation recipe for curating executable training environments w/o relying on human tests and issues. (b) Hybrid Inference Time Scaling: showing that while both execution-based and execution-free verifiers elicit inference-time gains; significantly better performance can be achieved by leveraging the strengths of both. (c) Overall, the final approach reflects SOTA performance for open-weight SWE-Agents, while also being competitive with some proprietary model baselines.
Please refer our Github Repo for detailed notes on Gym Environment Usage, Training, Inference and Executable SWE Environment Generation.
@misc{jain2025r2e-gym,
title={R2E-Gym: Procedural Environment Generation and Hybrid Verifiers for Scaling Open-Weights SWE Agents},
author={Jain Naman and Singh Jaskirat and Shetty Manish and Zheng Liang and Sen Koushik and Stoica Ion},
year={2025},
eprint={xxx.xxxx},
archivePrefix={arXiv},
primaryClass={cs.SE},
url={https://arxiv.org/abs/xxx.xxxx},
}