DEV Community

Cover image for πŸ¦† Stop Copy-Pasting Between AI Tabs β€” Use MCP Rubber Duck Instead
Mike
Mike

Posted on

πŸ¦† Stop Copy-Pasting Between AI Tabs β€” Use MCP Rubber Duck Instead

MCP Rubber Duck

You're debugging a nasty async bug. You explain it to an LLM. It gives you a confident answer.

But is it the right answer?

You could paste the same question into GPT, Gemini, and Llama to triple-check. But now you're juggling tabs, reformatting prompts, and losing context.

What if one command could ask them all β€” and show you where they agree and where they don't?


Different LLMs behave differently. GPT-5.1 might reach for one pattern, Gemini for another, and Llama for something else entirely. Sometimes the "wrong" model spots what the "right" one glossed over.

That's rubber-duck debugging meets AI β€” except now the duck talks back, and you get a whole panel instead of one.

What is MCP Rubber Duck?

MCP Rubber Duck is an open-source server that lets you query multiple LLMs at once through a single interface. Instead of switching between ChatGPT, Claude, and Gemini tabs, you send one prompt and see them all respond side-by-side.

(MCP is a standard protocol that lets AI tools like Claude Desktop use external servers β€” think of it like plugins for your AI assistant.)

Think of it as rubber duck debugging, but your ducks are AI models that can:

  • Surface different explanations and approaches
  • Highlight trade-offs between their proposed solutions
  • Rank candidate solutions with confidence scores
  • Refine answers over multiple rounds based on feedback from all models

Why This Actually Helps

We've all copy-pasted an answer from one model into another tab to "double-check" it. That's not overkill β€” it's rational. Different models have different training data, biases, and strengths.

The problem? It's tedious. You're juggling tabs, retyping prompts, and manually eyeballing differences between answers.

MCP Rubber Duck turns that copy-paste routine into a single command, with all the responses in one place.

Features That Matter

πŸ¦† Duck Council

Get responses from all your configured ducks in a single call. Ask once, compare multiple perspectives side-by-side.

await duck_council({
  prompt: "Should I use REST or GraphQL for my API?"
});
Enter fullscreen mode Exit fullscreen mode

Perfect for: architecture decisions, library comparisons, sanity-checking high-impact choices.

πŸ—³οΈ Multi-Duck Voting

Have your ducks vote on concrete options, with reasoning and confidence scores. Perfect when you're choosing between stacks, databases, or deployment strategies.

await duck_vote({
  question: "Best database for real-time chat?",
  options: ["PostgreSQL", "MongoDB", "Redis", "Cassandra"]
});
Enter fullscreen mode Exit fullscreen mode

Example output:

πŸ—³οΈ DUCK VOTE RESULTS
━━━━━━━━━━━━━━━━━━━━━

πŸ“Š Vote Tally:
  Redis:      β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 2 votes (67%)
  PostgreSQL: β–ˆβ–ˆβ–ˆβ–ˆβ–ˆ 1 vote (33%)

πŸ† Winner: Redis (Majority Consensus)

πŸ¦† GPT Duck: "Redis for pub/sub and sub-ms latency..."
πŸ¦† Gemini Duck: "Redis is purpose-built for real-time..."
πŸ¦† Groq Duck: "PostgreSQL with LISTEN/NOTIFY could work..."
Enter fullscreen mode Exit fullscreen mode

βš–οΈ LLM-as-Judge

Let one duck act as judge: it scores, ranks, and critiques the other ducks' answers against your criteria (correctness, depth, clarity, etc.).

const evaluation = await duck_judge({
  responses: councilResponses,  // from duck_council
  criteria: ["accuracy", "completeness", "practicality"],
  persona: "senior backend engineer"
});
// Returns ranked responses with scores and critique
Enter fullscreen mode Exit fullscreen mode

Use it to automatically pick the strongest response, or get structured peer review on code, designs, and specs.

πŸ”„ Iterative Refinement

Run an automatic critique-and-revise loop between two ducks: one attacks the draft with detailed feedback, the other rewrites it. You control the number of rounds.

const polished = await duck_iterate({
  prompt: "Write a migration plan for Postgres to MongoDB",
  iterations: 3,
  mode: "critique-improve"
});
Enter fullscreen mode Exit fullscreen mode

Great for turning rough notes into solid design docs, or iterating on prompts until they stop sucking.

πŸŽ“ Structured Debates

Run Oxford-style, Socratic, or adversarial debates between your ducks on any technical question. You define the motion, rules, and number of rounds.

const debate = await duck_debate({
  topic: "Microservices vs monolith for a 5-person startup",
  format: "oxford",  // or "socratic", "adversarial"
  rounds: 3
});
// Returns full transcript + summary of agreements, disagreements, and trade-offs
Enter fullscreen mode Exit fullscreen mode

Use it to stress-test your architecture or validate trade-offs before you commit. It sounds silly until you watch GPT and Gemini argue about whether you really need Kubernetes.


See It In Action: A Real Example

Let's say you're choosing a database for a real-time chat feature. Here's the full workflow:

Step 1: Ask the council

const council = await duck_council({
  prompt: "Best database for real-time chat with 10K concurrent users?"
});
Enter fullscreen mode Exit fullscreen mode

Step 2: Get multiple perspectives

πŸ¦† GPT Duck: "Redis is ideal β€” sub-millisecond latency, built-in pub/sub..."
πŸ¦† Gemini Duck: "Redis for real-time, but consider Postgres for persistence..."
πŸ¦† Groq Duck: "ScyllaDB if you need both speed and durability at scale..."
Enter fullscreen mode Exit fullscreen mode

Step 3: Let them vote

const vote = await duck_vote({
  question: "Best primary database for real-time chat?",
  options: ["Redis", "PostgreSQL", "ScyllaDB", "MongoDB"]
});
Enter fullscreen mode Exit fullscreen mode

Step 4: See the consensus

πŸ† Winner: Redis (67% majority)
πŸ“Š GPT Duck: Redis (confidence: 85%) β€” "Pub/sub is purpose-built for this"
πŸ“Š Gemini Duck: Redis (confidence: 78%) β€” "Latency requirements favor Redis"
πŸ“Š Groq Duck: ScyllaDB (confidence: 65%) β€” "Better durability trade-off"
Enter fullscreen mode Exit fullscreen mode

Your decision: Redis for the real-time layer, with the Groq Duck's point about durability noted for your persistence strategy.

This took 30 seconds instead of 30 minutes of tab-switching.


Setup

Prerequisites:

  • Node.js 18+ (node -v to check)
  • Claude Desktop installed (download)
  • At least one LLM API key (OpenAI, Gemini, etc.)

Step 1: Install globally

npm install -g mcp-rubber-duck
Enter fullscreen mode Exit fullscreen mode

Step 2: Find your Claude Desktop config

Open claude_desktop_config.json:

  • macOS: ~/Library/Application Support/Claude/claude_desktop_config.json
  • Windows: %APPDATA%\Claude\claude_desktop_config.json

Step 3: Add the Rubber Duck server

{
  "mcpServers": {
    "rubber-duck": {
      "command": "mcp-rubber-duck",
      "env": {
        "MCP_SERVER": "true",
        "OPENAI_API_KEY": "your-openai-key",
        "GEMINI_API_KEY": "your-gemini-key"
      }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

You can use just one provider β€” omit any keys you don't have.

Step 4: Restart and test

  1. Restart Claude Desktop
  2. Look for "rubber-duck" in the MCP servers list
  3. Try: "Ask my duck panel: what's the best way to handle errors in async JavaScript?"

If it doesn't appear, check the JSON syntax and run mcp-rubber-duck in a terminal to see errors.


Advanced: MCP Bridge (Optional)

Once your basic duck is working, you can give it superpowers by connecting to other MCP servers (documentation search, filesystem access, databases).

For example, with a docs server connected:

You: "Find React hooks docs and summarize the key patterns."

Duck: fetches 5,000 tokens of docs, returns 500 tokens of essentials

This keeps your context window clear and costs down.

Quick setup: Add to your env variables:

"MCP_BRIDGE_ENABLED": "true",
"MCP_SERVER_CONTEXT7_URL": "https://mcp.context7.com/mcp"
Enter fullscreen mode Exit fullscreen mode

See the full MCP Bridge docs for details.


Built-in Safety

MCP Rubber Duck includes guardrails so you don't accidentally blow your budget:

  • Rate limiting β€” cap requests per model
  • Token ceilings β€” hard limits on prompt/response size
  • PII redaction β€” optional filters for emails, secrets, IDs

Configure via environment variables. See docs for details.


When to Use This

Reach for MCP Rubber Duck when:

  • You're stuck on a tricky bug and want 3-4 hypotheses side-by-side
  • You're making architecture decisions and want multiple models to critique your approach
  • You're evaluating which model works best for your specific prompts
  • You're building AI workflows that need redundancy (if Model A hallucinates, B and C save the run)

Maybe skip it if:

  • You're happy with single-model responses
  • You don't want to manage multiple API keys
  • You're on a tight budget (3 models = 3x API costs)

What's Next

The project is actively maintained. Coming soon:

  • Weighted voting β€” models that perform better count more
  • Domain-specific duck personas β€” pre-tuned for security review, code review, docs
  • Disagreement detection β€” alerts when your ducks strongly disagree

Track progress in GitHub Issues.


The Bottom Line

If you've ever:

  • pasted the same question into three AI tabs, or
  • argued with one model about a subtle bug for 30 minutes

...MCP Rubber Duck turns that whole dance into a single command with a duck panel.

It's free, open-source, and yes β€” it's got ducks.


Try It Now

npm install -g mcp-rubber-duck
Enter fullscreen mode Exit fullscreen mode

⭐ Star on GitHub β€” helps more devs find it

πŸ“– Read the docs β€” full setup and API reference


Your Turn

I'd love to hear:

  • Which models are in your duck panel?
  • What's the wildest disagreement your ducks have had?

Drop your setup or favorite prompt in the comments β€” I might feature the best ones in a follow-up post!

Top comments (0)