You're debugging a nasty async bug. You explain it to an LLM. It gives you a confident answer.
But is it the right answer?
You could paste the same question into GPT, Gemini, and Llama to triple-check. But now you're juggling tabs, reformatting prompts, and losing context.
What if one command could ask them all β and show you where they agree and where they don't?
Different LLMs behave differently. GPT-5.1 might reach for one pattern, Gemini for another, and Llama for something else entirely. Sometimes the "wrong" model spots what the "right" one glossed over.
That's rubber-duck debugging meets AI β except now the duck talks back, and you get a whole panel instead of one.
What is MCP Rubber Duck?
MCP Rubber Duck is an open-source server that lets you query multiple LLMs at once through a single interface. Instead of switching between ChatGPT, Claude, and Gemini tabs, you send one prompt and see them all respond side-by-side.
(MCP is a standard protocol that lets AI tools like Claude Desktop use external servers β think of it like plugins for your AI assistant.)
Think of it as rubber duck debugging, but your ducks are AI models that can:
- Surface different explanations and approaches
- Highlight trade-offs between their proposed solutions
- Rank candidate solutions with confidence scores
- Refine answers over multiple rounds based on feedback from all models
Why This Actually Helps
We've all copy-pasted an answer from one model into another tab to "double-check" it. That's not overkill β it's rational. Different models have different training data, biases, and strengths.
The problem? It's tedious. You're juggling tabs, retyping prompts, and manually eyeballing differences between answers.
MCP Rubber Duck turns that copy-paste routine into a single command, with all the responses in one place.
Features That Matter
π¦ Duck Council
Get responses from all your configured ducks in a single call. Ask once, compare multiple perspectives side-by-side.
await duck_council({
prompt: "Should I use REST or GraphQL for my API?"
});
Perfect for: architecture decisions, library comparisons, sanity-checking high-impact choices.
π³οΈ Multi-Duck Voting
Have your ducks vote on concrete options, with reasoning and confidence scores. Perfect when you're choosing between stacks, databases, or deployment strategies.
await duck_vote({
question: "Best database for real-time chat?",
options: ["PostgreSQL", "MongoDB", "Redis", "Cassandra"]
});
Example output:
π³οΈ DUCK VOTE RESULTS
βββββββββββββββββββββ
π Vote Tally:
Redis: ββββββββββ 2 votes (67%)
PostgreSQL: βββββ 1 vote (33%)
π Winner: Redis (Majority Consensus)
π¦ GPT Duck: "Redis for pub/sub and sub-ms latency..."
π¦ Gemini Duck: "Redis is purpose-built for real-time..."
π¦ Groq Duck: "PostgreSQL with LISTEN/NOTIFY could work..."
βοΈ LLM-as-Judge
Let one duck act as judge: it scores, ranks, and critiques the other ducks' answers against your criteria (correctness, depth, clarity, etc.).
const evaluation = await duck_judge({
responses: councilResponses, // from duck_council
criteria: ["accuracy", "completeness", "practicality"],
persona: "senior backend engineer"
});
// Returns ranked responses with scores and critique
Use it to automatically pick the strongest response, or get structured peer review on code, designs, and specs.
π Iterative Refinement
Run an automatic critique-and-revise loop between two ducks: one attacks the draft with detailed feedback, the other rewrites it. You control the number of rounds.
const polished = await duck_iterate({
prompt: "Write a migration plan for Postgres to MongoDB",
iterations: 3,
mode: "critique-improve"
});
Great for turning rough notes into solid design docs, or iterating on prompts until they stop sucking.
π Structured Debates
Run Oxford-style, Socratic, or adversarial debates between your ducks on any technical question. You define the motion, rules, and number of rounds.
const debate = await duck_debate({
topic: "Microservices vs monolith for a 5-person startup",
format: "oxford", // or "socratic", "adversarial"
rounds: 3
});
// Returns full transcript + summary of agreements, disagreements, and trade-offs
Use it to stress-test your architecture or validate trade-offs before you commit. It sounds silly until you watch GPT and Gemini argue about whether you really need Kubernetes.
See It In Action: A Real Example
Let's say you're choosing a database for a real-time chat feature. Here's the full workflow:
Step 1: Ask the council
const council = await duck_council({
prompt: "Best database for real-time chat with 10K concurrent users?"
});
Step 2: Get multiple perspectives
π¦ GPT Duck: "Redis is ideal β sub-millisecond latency, built-in pub/sub..."
π¦ Gemini Duck: "Redis for real-time, but consider Postgres for persistence..."
π¦ Groq Duck: "ScyllaDB if you need both speed and durability at scale..."
Step 3: Let them vote
const vote = await duck_vote({
question: "Best primary database for real-time chat?",
options: ["Redis", "PostgreSQL", "ScyllaDB", "MongoDB"]
});
Step 4: See the consensus
π Winner: Redis (67% majority)
π GPT Duck: Redis (confidence: 85%) β "Pub/sub is purpose-built for this"
π Gemini Duck: Redis (confidence: 78%) β "Latency requirements favor Redis"
π Groq Duck: ScyllaDB (confidence: 65%) β "Better durability trade-off"
Your decision: Redis for the real-time layer, with the Groq Duck's point about durability noted for your persistence strategy.
This took 30 seconds instead of 30 minutes of tab-switching.
Setup
Prerequisites:
- Node.js 18+ (
node -vto check) - Claude Desktop installed (download)
- At least one LLM API key (OpenAI, Gemini, etc.)
Step 1: Install globally
npm install -g mcp-rubber-duck
Step 2: Find your Claude Desktop config
Open claude_desktop_config.json:
-
macOS:
~/Library/Application Support/Claude/claude_desktop_config.json -
Windows:
%APPDATA%\Claude\claude_desktop_config.json
Step 3: Add the Rubber Duck server
{
"mcpServers": {
"rubber-duck": {
"command": "mcp-rubber-duck",
"env": {
"MCP_SERVER": "true",
"OPENAI_API_KEY": "your-openai-key",
"GEMINI_API_KEY": "your-gemini-key"
}
}
}
}
You can use just one provider β omit any keys you don't have.
Step 4: Restart and test
- Restart Claude Desktop
- Look for "rubber-duck" in the MCP servers list
- Try: "Ask my duck panel: what's the best way to handle errors in async JavaScript?"
If it doesn't appear, check the JSON syntax and run mcp-rubber-duck in a terminal to see errors.
Advanced: MCP Bridge (Optional)
Once your basic duck is working, you can give it superpowers by connecting to other MCP servers (documentation search, filesystem access, databases).
For example, with a docs server connected:
You: "Find React hooks docs and summarize the key patterns."
Duck: fetches 5,000 tokens of docs, returns 500 tokens of essentials
This keeps your context window clear and costs down.
Quick setup: Add to your env variables:
"MCP_BRIDGE_ENABLED": "true",
"MCP_SERVER_CONTEXT7_URL": "https://mcp.context7.com/mcp"
See the full MCP Bridge docs for details.
Built-in Safety
MCP Rubber Duck includes guardrails so you don't accidentally blow your budget:
- Rate limiting β cap requests per model
- Token ceilings β hard limits on prompt/response size
- PII redaction β optional filters for emails, secrets, IDs
Configure via environment variables. See docs for details.
When to Use This
Reach for MCP Rubber Duck when:
- You're stuck on a tricky bug and want 3-4 hypotheses side-by-side
- You're making architecture decisions and want multiple models to critique your approach
- You're evaluating which model works best for your specific prompts
- You're building AI workflows that need redundancy (if Model A hallucinates, B and C save the run)
Maybe skip it if:
- You're happy with single-model responses
- You don't want to manage multiple API keys
- You're on a tight budget (3 models = 3x API costs)
What's Next
The project is actively maintained. Coming soon:
- Weighted voting β models that perform better count more
- Domain-specific duck personas β pre-tuned for security review, code review, docs
- Disagreement detection β alerts when your ducks strongly disagree
Track progress in GitHub Issues.
The Bottom Line
If you've ever:
- pasted the same question into three AI tabs, or
- argued with one model about a subtle bug for 30 minutes
...MCP Rubber Duck turns that whole dance into a single command with a duck panel.
It's free, open-source, and yes β it's got ducks.
Try It Now
npm install -g mcp-rubber-duck
β Star on GitHub β helps more devs find it
π Read the docs β full setup and API reference
Your Turn
I'd love to hear:
- Which models are in your duck panel?
- What's the wildest disagreement your ducks have had?
Drop your setup or favorite prompt in the comments β I might feature the best ones in a follow-up post!

Top comments (0)