Has anyone else seen prompt caching break because of UUIDs/timestamps near the front?

#discuss #llm #performance #showdev

Hey everyone,

I’ve been working on an open-source tool called CacheSentry, and I’m looking for feedback from people building real LLM apps.

The problem it focuses on is prompt-cache regressions.

In long-prompt apps, the beginning of the prompt is often mostly stable:

system instructions
tool schemas
policies
retrieved context
memory
conversation structure

But small dynamic fields can accidentally get inserted near the front:

UUIDs
timestamps
request IDs
session IDs
dynamic metadata
shuffled tool/schema order

That can silently destroy the stable prefix and reduce prompt-cache reuse.

In one controlled validation I ran:

Stable prompt: 2,816 cached tokens

UUID near the front: 0 cached tokens

UUID moved later: 2,816 cached tokens again

That was the moment I realized this is the kind of issue most teams probably won’t catch in code review.

So I built CacheSentry.

It analyzes prompt traces and can:

compare against a known-good baseline
detect dynamic fields near the stable prefix
estimate reusable token loss
identify the culprit field
fail CI when cacheability regresses
compare predictions with runtime cache signals where available

The idea is simple:

Unit tests check correctness.

Evals check output quality.

Observability checks latency and cost.

CacheSentry checks whether a prompt/template change broke cacheability.

Repo:
https://github.com/PS4Emp/cachesentry

I’m especially interested in feedback from people building agents, RAG systems, long-context apps, LiteLLM gateways, or OpenTelemetry-based LLM observability.

Has anyone here seen prompt caching behave unexpectedly because dynamic content moved too early in the prompt?

Top comments (2)

UnitBuilds • Jun 29

From my experience, when LLMs break like this, it's usually down to 1 thing. Structured output enforcing is a myth. A model can output JSON clean 99.999% of the time, but that 0.001% it leaks and it breaks, causing the entire context to be rendered poisonous, because that leak causes more leaks and before you know it, the model's context is entirely unusable.

So good move on the baseline check, keeps it universal, that way if it suddenly bleeds a UUID over to date and everything gets jumbled, it can spot it.

Marouane K • Jul 15

Hi ps4emp, I saw your post about CacheSentry and the issue with UUIDs/timestamps. I've worked with similar problems in the past. Have you considered using a content orchestration tool like Clypify to streamline your content management?