DEV Community

Ps4Atom
Ps4Atom

Posted on

Has anyone else seen prompt caching break because of UUIDs/timestamps near the front?

Hey everyone,

I’ve been working on an open-source tool called CacheSentry, and I’m looking for feedback from people building real LLM apps.

The problem it focuses on is prompt-cache regressions.

In long-prompt apps, the beginning of the prompt is often mostly stable:

  • system instructions
  • tool schemas
  • policies
  • retrieved context
  • memory
  • conversation structure

But small dynamic fields can accidentally get inserted near the front:

  • UUIDs
  • timestamps
  • request IDs
  • session IDs
  • dynamic metadata
  • shuffled tool/schema order

That can silently destroy the stable prefix and reduce prompt-cache reuse.

In one controlled validation I ran:

Stable prompt: 2,816 cached tokens

UUID near the front: 0 cached tokens

UUID moved later: 2,816 cached tokens again

That was the moment I realized this is the kind of issue most teams probably won’t catch in code review.

So I built CacheSentry.

It analyzes prompt traces and can:

  • compare against a known-good baseline
  • detect dynamic fields near the stable prefix
  • estimate reusable token loss
  • identify the culprit field
  • fail CI when cacheability regresses
  • compare predictions with runtime cache signals where available

The idea is simple:

Unit tests check correctness.

Evals check output quality.

Observability checks latency and cost.

CacheSentry checks whether a prompt/template change broke cacheability.

Repo:
https://github.com/PS4Emp/cachesentry

I’m especially interested in feedback from people building agents, RAG systems, long-context apps, LiteLLM gateways, or OpenTelemetry-based LLM observability.

Has anyone here seen prompt caching behave unexpectedly because dynamic content moved too early in the prompt?

Top comments (0)