Has anyone else seen prompt caching break because of UUIDs/timestamps near the front?

Ps4Atom — Mon, 29 Jun 2026 09:00:20 +0000

Hey everyone,

I’ve been working on an open-source tool called CacheSentry, and I’m looking for feedback from people building real LLM apps.

The problem it focuses on is prompt-cache regressions.

In long-prompt apps, the beginning of the prompt is often mostly stable:

system instructions
tool schemas
policies
retrieved context
memory
conversation structure

But small dynamic fields can accidentally get inserted near the front:

UUIDs
timestamps
request IDs
session IDs
dynamic metadata
shuffled tool/schema order

That can silently destroy the stable prefix and reduce prompt-cache reuse.

In one controlled validation I ran:

Stable prompt: 2,816 cached tokens

UUID near the front: 0 cached tokens

UUID moved later: 2,816 cached tokens again

That was the moment I realized this is the kind of issue most teams probably won’t catch in code review.

So I built CacheSentry.

It analyzes prompt traces and can:

compare against a known-good baseline
detect dynamic fields near the stable prefix
estimate reusable token loss
identify the culprit field
fail CI when cacheability regresses
compare predictions with runtime cache signals where available

The idea is simple:

Unit tests check correctness.

Evals check output quality.

Observability checks latency and cost.

CacheSentry checks whether a prompt/template change broke cacheability.

Repo:
https://github.com/PS4Emp/cachesentry

I’m especially interested in feedback from people building agents, RAG systems, long-context apps, LiteLLM gateways, or OpenTelemetry-based LLM observability.

Has anyone here seen prompt caching behave unexpectedly because dynamic content moved too early in the prompt?

DEV Community: Ps4Atom

Has anyone else seen prompt caching break because of UUIDs/timestamps near the front?