DEV Community

Cover image for Repeat Yourself
vmx
vmx

Posted on

Repeat Yourself

Turns out if you repeat your prompt, the model gives you a better answer.

Not a smarter model. Not a bigger context window. Not chain of thought. You just say the same thing twice and it works better. Google researchers tested this across Gemini, GPT, Claude, Deepseek -- 47 wins out of 70 benchmarks, zero losses.

The reason is the kind of thing that makes you stare at your screen for a minute. In a transformer, token 1 can't see token 50. It's causal masking, each token only attends to what came before it. So the first words of your prompt are always processed with the least context. They're flying blind. When you repeat the prompt, the second copy's early tokens can attend to the entire first copy. You're giving the beginning of your question the context it never had.

The architecture has a constraint, nobody notices because the output is good enough, then someone tries the dumbest possible fix and it works because the constraint was real. Retries fix distributed systems. Caches fix slow queries. Repeating yourself fixes attention asymmetry.

The part that got me though -- reasoning models already do this. When you turn on chain of thought, the effect disappears. Turns out models trained with reinforcement learning independently learned to repeat parts of the question back before answering. The architecture had a flaw, and the training process found the same workaround on its own.

Paper: Prompt Repetition Improves Non-Reasoning LLMs -- Leviathan, Kalman, Matias (2025)

Top comments (0)