Cursor's Composer 2.5 scores 79.8% on SWE-Bench Multilingual at $0.50/M tokens, matching Opus 4.7 and GPT-5.5 at 30x lower cost.
Cursor's Composer 2.5 scores 79.8% on SWE-Bench Multilingual, matching Anthropic's Opus 4.7 and OpenAI's GPT-5.5. It costs $0.50 per million input tokens versus Opus 4.7's $15.
Key facts
- Composer 2.5 scores 79.8% on SWE-Bench Multilingual
- Costs $0.50/M input tokens vs Opus 4.7's $15/M
- Trained on 25x more synthetic tasks than Composer 2
- 85% of compute budget went to training and RL
- Successor model training on Colossus-2 with 1M H100s
Cursor shipped Composer 2.5, a major upgrade to its in-house AI coding model built on the open-source Kimi K2.5 checkpoint from Moonshot AI. According to the company's blog post, the model was trained on 25 times more synthetic tasks than its predecessor, Composer 2, and 85 percent of the compute budget went toward extra training and reinforcement learning.
On benchmarks like SWE-Bench Multilingual (79.8 percent) and CursorBench v3.1 (63.2 percent), Composer 2.5 matches Anthropic's Opus 4.7 and OpenAI's GPT-5.5. The unique take here is not that a smaller model matches frontier labs — it's that Cursor has achieved this at a 30x cost advantage, which fundamentally shifts the economics of AI coding tools.
Pricing and variants
The standard Composer 2.5 costs $0.50 per million input tokens and $2.50 per million output tokens. A faster variant with the same benchmark performance runs $3.00 and $15.00 per million tokens, respectively. For comparison, Anthropic charges $15 per million input tokens for Opus 4.7 [according to Anthropic's pricing page]. OpenAI's GPT-5.5 costs $10 per million input tokens [per OpenAI's API docs].
Composer 2.5 is live in Cursor now. The company did not disclose the exact training cost but noted that a much larger successor model is already in training with SpaceX and xAI, using ten times the compute on the Colossus-2 cluster with one million H100 equivalents. SpaceX had previously announced plans to acquire Cursor for $60 billion.
What this means for the coding tools market
Cursor's approach — fine-tuning an open-source checkpoint with massive synthetic data and RL — directly challenges the narrative that only frontier labs with $10B+ training runs can produce top-tier coding models. If Composer 2.5 maintains this performance in production, it pressures Anthropic and OpenAI to justify their premium pricing for coding tasks.
What to watch
Watch for independent benchmark reproductions of Composer 2.5 on SWE-Bench Verified, and whether Cursor publishes a technical report detailing the synthetic data generation pipeline. The successor model trained with SpaceX and xAI on Colossus-2 could set a new bar for coding benchmarks in Q3 2026.
Originally published on gentic.news

Top comments (0)