Last week Claude Code was stuck on an architecture decision. Instead of going back and forth with one model, I thought — what if it could phone a friend? Or three?
So I built brainstorm-mcp, an MCP server that gives your coding agent a brainstorming team. Claude poses the question, GPT-5, Gemini, DeepSeek, and others each bring their perspective, then they build on each other's ideas across multiple rounds. A synthesizer distills everything into a consolidated recommendation.
It's not about picking a winner — it's about getting perspectives you'd never get from a single model.
How it works
- You give it a topic: "Design the architecture for a next-gen AI-powered code review tool"
- All models respond independently in Round 1 — diverse, unbiased first takes
- In Round 2, they see each other's ideas and build on them, challenge assumptions, or offer alternatives
- A synthesizer distills the best ideas into a final recommendation
Claude participates as a brainstormer with full context of your codebase, so the discussion is grounded in your actual code — not abstract advice.
Real example: AI code review architecture
I asked three models to design an AI-powered code review tool that goes beyond linting. Here's what each one brought to the table:
GPT-5.2 went deep on systems architecture — a full Temporal-orchestrated pipeline with a Repo Knowledge Graph, multi-pass LLM review (intent inference → local correctness → architectural reasoning → team alignment), and a feedback loop that learns from accepted/rejected suggestions. Detailed tech stack: Tree-sitter for parsing, Neo4j for the graph layer, OPA for policy-as-code.
DeepSeek focused on the data flow and learning loop — how the system should ingest PRs, build context packs from the dependency neighborhood, and gradually learn team conventions. Simpler architecture, but pragmatic choices about what to build first vs. defer.
Claude (with codebase context from my actual project) grounded the discussion — pointed out which components I already had, where existing CI pipelines could be reused, and which parts of the architecture were overkill for my scale.
The synthesis combined GPT's comprehensive systems design with DeepSeek's pragmatic sequencing and Claude's "here's what you actually need right now" reality check. 2 rounds, 75 seconds, ~11k tokens, $0.07.
What surprised me
Each model has a genuine personality. GPT goes wide and comprehensive — you'll get a 6-section architecture doc. DeepSeek favors elegant simplicity. Claude with codebase context catches things the others miss entirely. The value isn't in any single response — it's in the combination.
Round 2 is where the magic happens. Round 1 gives you breadth — independent first takes. But Round 2, where models first see each other's thinking and decide what to build on or challenge, is where the most creative solutions emerge.
Cheaper than you'd think. A full debate costs $0.02-0.07. Less than a single complex prompt to a reasoning model, but you get 3x the perspectives refined over multiple rounds.
Local models add real value. Running Llama via Ollama alongside cloud models adds diversity. They think differently — less polished but more willing to suggest unconventional approaches.
Setup
It's an MCP server, so if you use Claude Code, Claude Desktop, or any MCP client:
npx brainstorm-mcp
Add your API keys and you're done. Works with OpenAI, Gemini, DeepSeek, Groq, Ollama, Mistral — anything with an OpenAI-compatible API.
One more thing
While building brainstorm-mcp, I kept losing track of decisions and tasks across sessions. Claude would forget what we'd discussed yesterday. So I built saga-mcp — a Jira-like project tracker that runs as an MCP server. SQLite-backed, 31 tools, full hierarchy from Projects down to Subtasks. One tracker_dashboard call gives your AI agent full context to pick up where it left off.
Both are free, open source, MIT licensed.
Top comments (0)