SWE-rebench Leaderboard (Feb 2026): GPT-5.4, Qwen3.5, Gemini 3.1 Pro, Step-3.5-Flash And More

March 23, 202610 min read

2026 InsightsExpand replaces demos and the old 80‑step limit with a 128 k token context window and auxiliary interfaces.

Claude Opus 4.6 remains top‑performing, while GLM‑5, DeepSeek‑V3.2, GPT‑5.4, and Qwen3‑Coder‑Next stand out for efficiency, coverage, or ultra‑large context. Open‑source models now match proprietary ones, making large‑context handling and caching pivotal.

Read Original Article Back to Homepage