With recent launches from Cursor and Windsurf, the coding model race just got serious.
Cursor dropped Composer-1 during its big 2.0 launch, and just hours later, Windsurf released its SWE-1.5 model. Both are being hyped as “agentic coding” models, LLMs that can not only write code but also plan, review, and iterate like a real engineer.
So I had to put both to the test in some real coding challenges to see how they stack up in practice.
In this article, we’ll see what Composer-1 and SWE-1.5 can do, and how well they compare in actual coding.
TL;DR
If you just want the short version, here’s what I found:
- Windsurf SWE-1.5 consistently produced more reliable and complete code. It was much faster, often finishing in half the time of Cursor’s Composer-1.
- Cursor Composer-1 has solid reasoning and architecture planning but often slowed down or froze mid-generation. Its results were functional but less polished.
- For complex, multi-file projects, SWE-1.5 handled dependencies and interactions far better. Composer-1 still felt experimental, though promising in structure and approach.
If speed, stability, and real-world usability matter, SWE-1.5 is the clear winner. If you care more about innovation and potential in an agentic workflow, Composer-1 is worth watching as it matures.
If you prefer a Video version, check this out:
How’s the test done?
For fairness, I ran both models under similar conditions.
Both were accessed via their respective coding environments — Cursor 2.0 for Composer-1 and Windsurf for SWE-1.5.
Each model was given identical prompts and tested on two coding challenges:
- Responsive Typing Game (Monkeytype Clone)
- 3D Solar System Simulator
All runs were recorded live, and execution speed was noted based on total wall-time until the final working output.
Brief About Cursor's Composer-1
Let’s start with Cursor’s Composer-1.
This model is built for “agentic coding”, it doesn’t just complete your code, it thinks through multi-step changes, explains fixes, and can plan end-to-end implementations.
Cursor claims it’s 4× faster than competitors, powered by a mixture-of-experts (MoE) setup where specialized sub-models tackle different tasks.
It’s trained through reinforcement learning in sandboxed environments, meaning it learned by coding thousands of apps instead of just reading static GitHub data.
Most responses were completed in under 30 seconds, though complex multi-file generations took longer.
The base model isn’t public, but speculation suggests it’s built on GLM-4.6 or Qwen3-Coder.
Brief About Windsurf's SWE-1.5
Now let’s talk about Windsurf’s SWE-1.5.
SWE-1.5 is marketed as “agency-grade”, optimized for coding agents that can reason across multiple files, terminals, and APIs simultaneously.
It runs on NVIDIA GB200 clusters, pushing around 950 tokens/sec, which is roughly 13× faster than Claude Sonnet and 6× faster than Haiku.
It’s designed for large-scale workflows:
- Works across multiple files and repositories
- Handles code, docs, APIs, and terminal context
- Supports 128K tokens, enough for entire project folders
Like Composer-1, SWE-1.5 also uses a MoE + RLHF approach but scales it with distributed training and human-feedback loops fine-tuned for engineering use.
Quick Comparison Table:
| Feature | Cursor Composer-1 | Windsurf SWE 1.5 |
|---|---|---|
| Core Technology | MoE + RL fine-tuning | MoE + RLHF, agency workflows |
| Speed | 4x faster than peers (<30s) | 950 tokens/sec (13x Claude) |
| Context Window | Extended, long-term | 128K tokens |
| Multi-modal Capability | Code, docs, UIs | Code, docs, APIs, UIs, terminal |
| Benchmark Results | Mixed performance; struggles in frameworks | Lower completion, poor build logic |
| Transparency | No base model disclosed | No base model disclosed |
| Cloud Hardware | Custom NVIDIA clusters | Cerebras GB200, NVIDIA |
Coding Comparison
For this comparison, I tested both models on two projects to see how they handle multi-step reasoning, UI logic, and runtime stability.
1. Responsive Typing Game (Monkeytype Clone)
First, I tried to create a Responsive Typing game like MonkeyType. Here's the prompt that I gave:
Prompt:
Build a modern, highly responsive typing game inspired by Monkeytype, with a clean and elegant UI based on the Catppuccin color themes (choose one or allow toggling between Latte, Frappe, Macchiato, Mocha). The typing area should feature randomized or curated famous texts (quotes, speeches, literature) displayed in the center of the screen.
- *Key Requirements:**
- The user must type the exact characters, including uppercase and lowercase, punctuation, and spaces.
- As the user types correctly, a fire or flame animation should appear under each typed character (like a typing trail).
- Mis-typed characters should be clearly marked with a subtle but visible indicator (e.g., red underline or animated shake).
- A minimalist virtual keyboard should be shown at the bottom center, softly glowing when keys are pressed.
- Include features such as WPM, accuracy, time, and combo streak counter.
- Once the user finishes typing the passage, show a summary screen with statistics and an animated celebration (like fireworks or confetti).
- **Design Aesthetic:**
- Soft but expressive, using the **Catppuccin** palette.
- Typography should be elegant and readable (e.g., Fira Code, JetBrains Mono).
- Use soft drop shadows, rounded corners, and smooth transitions.
Be creative in how you implement the fire animation. For example, the flame could rise up gently from the letter just typed, or the typing trail could resemble a burning path.
Composer-1:
The implementation worked but lacked smoothness.
It generated a solid typing area and logic for word tracking but missed polish on UI animation and event handling. ( For Example, the Texts in the text box were not visible)
It also took a long time. Completion took around 3 minutes, and while the output ran, it wasn’t visually refined.
Here's the Output:
SWE-1.5:
This one nailed it.
The model delivered a functional, clean, and responsive typing interface on the first try.
The animation, WPM logic, and accuracy counter worked seamlessly, and it completed in under 1 minutes.
Here's the Output:
Overall, in this test, SWE-1.5 is a clear winner in this round. It was faster, cleaner, and gave better output.
2. 3D Solar System Simulator
Then, I tried creating a 3D Solar System. The main goal is to test how well it works with other libraries like Three.js.
Here's the prompt that I gave:
Prompt:
3D Solar System Simulator
Create an interactive 3D solar system simulation using Three.js and WebGL.
Planets orbit the Sun at different speeds, and the camera can freely rotate, zoom, or focus on specific planets.
Core Requirements:
- Use Three.js (CDN import, no bundlers).
- Include at least the Sun and 4 planets (Mercury, Venus, Earth, Mars).
- Each planet orbits the Sun at a unique speed and distance.
- Add a “time speed” slider to control orbit speed.
- Implement OrbitControls for camera movement and zoom.
- Clicking a planet focuses the camera on it.
Design & Aesthetics:
- Use emissive glow or bloom for the Sun.
- Give each planet a distinct color or texture.
- Add simple ambient + directional light for visibility.
- Use soft gradients or starfield background for space.
- Include smooth camera transitions when focusing on a planet.
Mechanics:
- Animate orbits using trigonometric rotation around the Sun.
- Use accurate scaling ratios for distances and sizes (roughly proportional, not real-world scale).
- Add optional orbit trails to visualize paths.
- Maintain smooth frame rate with GPU-efficient updates.
Use Three.js and vanilla JavaScript to build the demo in one HTML file.
Composer-1:
This time, it generated the code faster than the last time!
The structure with Three.js setup was correct, but it struggled with the orbit logic and camera transitions. ( Also, the Planets were hardly discoverable)
The simulation loaded, but planet motion was inconsistent, and output latency was higher.
Here's the Output:
SWE-1.5:
It was blazing fast, almost in the blink of an eye!
It implemented realistic planetary orbits, scaling, and camera motion perfectly.
The app ran smoothly with accurate 3D rendering in most of cases. But it was lagging a bit while focusing from one planet to another.
Here's the Output:
Overall, SWE-1.5 again leads by a wide margin, with better 3D math, cleaner visuals, and far faster generation.
Conclusion
That's a Wrap!
Both Composer-1 and SWE-1.5 push the boundary of agentic coding.
They can plan, code, and debug like early versions of autonomous dev partners, not just assistants.
But in these tests:
- SWE-1.5 delivered faster and more stable results across both tasks.
- Composer-1 showed solid reasoning, but slower response and less consistent code.
If you’re optimizing for speed, stability, and output quality, go with Windsurf SWE-1.5.
If you want to explore agentic behavior and reasoning depth, Cursor Composer-1 has interesting potential, it just needs refinement.
Either way, it’s clear we’re entering a new era where coding assistants are no longer passive; they’re becoming collaborators.
Let me know what you think of these two models in the comments below!








Top comments (0)