This was a tiny-model before/after run with a very ordinary goal: keep the answer useful when the wording changes.
The setup used Qwen2.5-0.5B-Instruct with a memory layer around it.
The measured result
From the proof pack:
- Before latency: 10,061.7 ms
- After latency: 4,652.6 ms
- Before tokens: 35
- After tokens: 97
- Token saved: -177.1%
- Latency delta: -5,409.1 ms
- Peak RSS: 1,794 MB
That is a nice reminder that “smaller prompt” is not always the same thing as “better answer”. Sometimes the smarter move is to give the model the right memory, even if it costs a few more tokens.
What the demo showed
The before run was raw. The after run used a compressed memory summary that kept the useful facts and dropped the filler.
That is the point of this kind of system: stay useful when the wording changes.
Proof pack
Links
- Proof pack: https://zo.pub/man42/qwen-sky-proof
- GitHub profile: https://github.com/AmSach
- Instagram: https://www.instagram.com/i.amsach
- LinkedIn: https://www.linkedin.com/in/theamansachan



Top comments (0)