This article was originally published on aicoderscope.com
Windsurf started life as Codeium — a free-tier autocomplete contender trying to undercut GitHub Copilot on price. In December 2025, Cognition AI, the company behind Devin (the $500/month autonomous coding agent), bought it for approximately $250 million. Since then the product has pivoted hard: from "cheap autocomplete" to "agentic IDE with a proprietary frontier model."
That pivot is either exciting or alarming depending on what you need from a coding tool. This review covers what Windsurf is in May 2026 — after two major post-acquisition releases, a new model family, and a pricing restructure.
What changed when Cognition took over
At acquisition, Windsurf had $82 million in ARR, over 350 enterprise customers, and a 210-person team. Cognition's play was straightforward: they had Devin, a headless autonomous coding agent; Windsurf gave them a local IDE, a large existing developer user base, and a payment relationship with people already buying AI tools.
The integration landed as Windsurf 2.0 on April 15, 2026:
- Devin Cloud integration — Devin can now run autonomous tasks directly from the IDE, managed through the new Agent Command Center
- Agent Command Center — Kanban-style panel for managing multiple Cascade and Devin sessions simultaneously
- Devin for Terminal (April 28) — Devin runs inside your local terminal with full codebase access, not just cloud-isolated containers
- Devin Review (May 6, available to all users) — automated code review on any pull request without manually initiating a Cascade session
The SWE-1 model family shipped alongside these integrations. SWE-1.5 was the first release; SWE-1.6 followed with more than 10% improvement on SWE-Bench Pro performance and meaningful behavioral tuning.
Cascade: the reason developers stay
Cascade is Windsurf's core agent mode. The distinction from a standard chat panel matters: it reads your entire repository, tracks edits you've made during the session, and executes multi-step tasks across multiple files from a single instruction.
A DevToolsReview test on a production codebase had Cascade identify 11 relevant endpoints across 4 router files during a refactoring session — without any manual context-feeding. That codebase-awareness is the capability driving adoption.
Where Cascade earns its keep:
- Multi-file refactors — works well when the scope is clear up front
- Codemaps — AI-annotated visual maps of code structure with grouped sections and precise line-level links; useful for understanding unfamiliar codebases before making changes
- Fast Context via SWE-grep — Windsurf claims 10× faster relevant-code retrieval compared to standard agentic search
- Session memory — Cascade tracks context between sessions on the same project, not just within a single conversation
The documented failure mode: when Cascade goes wrong mid-task, recovery is expensive. There's no partial correction mechanism. You can't say "steps 1–3 were right, redo only step 4." A wrong turn almost always forces a full restart from a clean state. Cascade also crashes during long-running agent sequences, particularly with Turbo Mode active and during background codebase indexing — multiple changelog entries from March and April 2026 specifically address conversation crashes (v2.1.32 fixed several; v2.3.9 in May addressed more stability issues).
For 3-file changes, Cascade is impressive. For 30-file architectural refactors, the crash risk is real enough that you want frequent commits before every Cascade session.
SWE-1.6: Cognition's proprietary model
The SWE-1.6 model is technically the most interesting thing Windsurf has. Cognition trains it end-to-end via reinforcement learning on real task environments using a Cascade agent harness on top of an open-source base model. The result is a model that behaves more like a software agent than a chat model.
| Metric | SWE-1.6 |
|---|---|
| Speed (free tier) | 200 tok/s via Fireworks |
| Speed (paid tier) | 950 tok/s via Cerebras |
| SWE-Bench Pro vs SWE-1.5 | +10% improvement |
| Current availability | Free for 3 months from release |
950 tokens per second is fast enough to notice in real sessions. Cognition benchmarks SWE-1.5 at 6× faster than Claude Haiku 4.5 and 13× faster than Claude Sonnet 4.5 — SWE-1.6 matches that speed profile. Cascade responses at this speed feel interactive.
The behavioral improvements in SWE-1.6 translate directly to better Cascade sessions: it uses parallel tool calls more often, loops less, and reaches for its own tools rather than dropping to the terminal for file operations. Cognition also added a length penalty during training to discourage verbosity, which cuts unnecessary back-and-forth in long tasks.
SWE-1.6 is proprietary software. You cannot run it locally, cannot use it with another IDE, and its post-free-period pricing is unannounced. If it becomes a paid add-on, the value math at $20/mo changes.
The model roster: widest in the market
Beyond SWE-1.6, Windsurf offers access to more frontier models in a single IDE than any other coding tool currently shipping:
- Anthropic: Claude Opus 4.7 with Fast Mode (~2.5× output speed, added May 12), Claude Opus 4.6 Thinking, Claude Sonnet 4.6 Thinking
- OpenAI: GPT-4o, GPT-5 family with Low/Medium/High/XHigh thinking levels, fast priority options
- Google: Gemini Flash, Gemini Pro variants with configurable reasoning intensity
- Windsurf native: SWE-1.5, SWE-1.6 (free tier), Adaptive ($0.50/$2.00 input/output per million tokens)
- Others: xAI Grok, DeepSeek V4 ($1.74/$3.48 per million tokens), Moonshot Kimi K2.6 ($0.95/$4.00), GLM-5.1
On Pro ($20/mo), extra usage beyond the plan quota is billed at API price through Windsurf's billing layer. This differs from Cursor's credit system — it's a metered model with a monthly base, so heavy agent usage on expensive models can add up mid-month.
If you've been managing separate API keys for Claude, OpenAI, and Gemini to route tasks to the right model, Windsurf's unified billing is genuinely convenient.
Tab autocomplete: the weakest link
Tab is Windsurf's inline autocomplete — next-edit prediction rather than next-token completion. It predicts where you're going based on recent edits, suggests multi-line completions, and fills out implementations from function signatures.
The problem is consistency. DevToolsReview measured Tab at 53–60% usability versus 70–75% for Cursor and GitHub Copilot. The latency is visible. Completions sometimes fail to trigger in obvious situations — a function signature followed by an obvious implementation, for instance, where Cursor would fill confidently. Windsurf stutters.
For a feature you interact with on every single keystroke, these inconsistencies accumulate into friction during deep work sessions.
Tab is unlimited on all plans including Free. Windsurf is still a viable free autocomplete tool if the quality gap doesn't bother you. But if autocomplete quality is your primary criterion, Cursor and GitHub Copilot are ahead.
Pricing: what you actually pay
Verified against windsurf.com/pricing on May 20, 2026:
| Plan | Price | Quota | Key extras |
|---|---|---|---|
| Free | $0/mo | Daily/weekly limits | Tab (unlimited), SWE-1.6 free tier, all premium models |
| Pro | $20/mo | Unlimited (extra at API price) | Deploys, Fast Context, SWE-1.5, all models |
| Max | $200/mo | Unlimited (extra at API price) | Devin Cloud access, centralized billing, admin dashboard, priority support |
| Teams | $40/user/mo | Unlimited (extra at API price) | SSO, RBAC, access control, volume discounts |
| Enterprise | Custom | Unlimited | Hybrid deployment, all Teams features |
Students with a verified .edu email get approximately 50% off Pro — roughly $10/mo.
The Max tier at $200/mo is where the Cognition acquisition becomes financially visible: you're paying for b
Top comments (0)