DEV Community

Aamer Mihaysi
Aamer Mihaysi

Posted on • Originally published at mehaisi.com

Double Prompting: Zero-Cost LLM Accuracy Boost

Double Prompting: Zero-Cost LLM Accuracy Boost

Just send your prompt twice. Accuracy improves across every major model.

The Problem: Asymmetric Attention

LLMs process left-to-right. When you write context first, question second:

The context wasn't generated knowing the question. This is structural, not a bug.

The Fix: Second Pass

Second pass = every token attends to every other token. Context gets re-weighted with question awareness.

Results

Tested across 7 benchmarks × 7 models (Gemini, GPT, Claude, DeepSeek):

Metric Result
Accuracy Gains on every model
Latency No meaningful increase
Output tokens No overhead
Best case 21% → 97% accuracy (4.6x)

Why It Works

Pass 1: Establish context-question relationship
Pass 2: Leverage full bidirectional attention

Input processing is parallel on GPU. Overhead is negligible.

Trade-offs

Pros:

  • Zero code changes beyond prompt formatting
  • No model switching required
  • Works for complex reasoning tasks

Cons:

  • 2x input token costs (cheaper than output though)
  • Doesn't replace task-specific fine-tuning

When To Use

  • Production prompts needing accuracy boost
  • Complex reasoning tasks
  • Zero-latency-budget scenarios

The Lesson

Sometimes the best optimization isn't more parameters. It's using what you have twice.

Top comments (0)