SWE-rebench Leaderboard (Feb 2026): GPT-5.4, Qwen3.5, Gemini 3.1 Pro, Step-3.5-Flash And More
March 23, 202610 min read
2026 InsightsExpand replaces demos and the old 80‑step limit with a 128 k token context window and auxiliary interfaces.
Claude Opus 4.6 remains top‑performing, while GLM‑5, DeepSeek‑V3.2, GPT‑5.4, and Qwen3‑Coder‑Next stand out for efficiency, coverage, or ultra‑large context. Open‑source models now match proprietary ones, making large‑context handling and caching pivotal.
