
When I first started using LLMs seriously, my strategy was simple:
Put everything in one long prompt and hope it works.
Requirements. Constraints. Logs. Code. Edge cases.
All in one place.
It usually worked.
Until it didn’t.
Sometimes the model ignored a constraint I clearly wrote.
Sometimes it contradicted something in the prompt.
Sometimes giving it more context made the answer worse.
I even used to write things like: “Analyze our entire codebase and follow our coding patterns.” Our codebase at Taskip was massive. Looking back, that was… optimistic 😁.
There’s a reason for that.
The “Lost in the Middle” Problem
A research paper called “Lost in the Middle” studied how LLMs use long contexts.
Researchers gave models many documents and placed the correct information:
- At the beginning
- In the middle
- At the end
If long context worked perfectly, performance would be the same everywhere.
It wasn’t.
Models performed best when the relevant information was at:
- The beginning
- The end
And worst when it was in the middle.
In some cases, performance in the middle was even worse than giving the model no documents at all.
That’s not a small effect.
Why This Feels Familiar
Interestingly, this isn’t just an LLM problem.
LLMs are built on neural networks — loosely inspired by how biological neural networks (our brains) work. And humans show a similar pattern called the serial-position effect.
When we read a book, we usually remember:
- The opening
- The ending
More clearly than the middle chapters.
In conversations, we often recall how something started or how it ended, but details from the middle fade faster.
Even though transformer models can technically attend to every token equally, in practice they show a similar bias. The beginning and end tend to have more influence.
Strangely, humans have this same problem. We also remember beginnings and endings better than middles. But that doesn't explain why LLMs do it — the actual reason is still a mystery.
Bigger Context Windows Don’t Fix It
You might think:
“Okay, but newer models have 100k or 200k tokens. That should solve it.”
Not really.
The research shows extended-context versions of models perform almost the same as smaller-context versions when the input fits in both.
So:
- Larger context = more space
- Not necessarily better reasoning
Larger context windows give you more room — but they don’t automatically improve how well the model uses long inputs.
More tokens ≠ better usage.
Why This Matters for Developers
If you:
- Feed large code files into prompts
- Pass long logs
- Add many constraints
- Keep long chat histories
Important information placed in the middle may get underweighted.
That explains why sometimes the model ignores a rule you clearly wrote.
It’s not random.
It’s positional bias.
Practical Prompting Strategy
After reading this, I changed how I structure prompts.
1. Put critical rules at the top
Output format. Hard constraints. Non-negotiables.
You must return valid JSON.
Do not include explanations.
Follow the schema exactly.
Here is the data:
...
2. Reinforce key constraints at the end
The end also gets strong attention.
Remember:
- Output must be valid JSON
- No explanations
3. Keep the middle for supporting content
Code, logs, documentation, background info — that can sit in the middle.
4. Don’t let chats grow forever
Long conversations can dilute important instructions.
Sometimes starting a new, clean prompt gives better results than continuing a huge thread.
The Core Idea
LLMs don’t use long context evenly.
They’re strongest at the start and end.
The middle is weaker.
So structure your prompts like this:
- Top → Critical instructions
- Middle → Supporting data
- Bottom → Reinforcement
Prompt structure isn’t just formatting.
It directly affects output quality.
Top comments (0)