Double Prompting: Zero-Cost LLM Accuracy Boost
Just send your prompt twice. Accuracy improves across every major model.
The Problem: Asymmetric Attention
LLMs process left-to-right. When you write context first, question second:
The context wasn't generated knowing the question. This is structural, not a bug.
The Fix: Second Pass
Second pass = every token attends to every other token. Context gets re-weighted with question awareness.
Results
Tested across 7 benchmarks × 7 models (Gemini, GPT, Claude, DeepSeek):
| Metric | Result |
|---|---|
| Accuracy | Gains on every model |
| Latency | No meaningful increase |
| Output tokens | No overhead |
| Best case | 21% → 97% accuracy (4.6x) |
Why It Works
Pass 1: Establish context-question relationship
Pass 2: Leverage full bidirectional attention
Input processing is parallel on GPU. Overhead is negligible.
Trade-offs
Pros:
- Zero code changes beyond prompt formatting
- No model switching required
- Works for complex reasoning tasks
Cons:
- 2x input token costs (cheaper than output though)
- Doesn't replace task-specific fine-tuning
When To Use
- Production prompts needing accuracy boost
- Complex reasoning tasks
- Zero-latency-budget scenarios
The Lesson
Sometimes the best optimization isn't more parameters. It's using what you have twice.
Top comments (0)