How I built narrative drift detection for LLM agent runs

#agents #llm #monitoring #showdev

Every LLM observability tool monitors
individual requests.

None of them monitor position consistency
across a conversation.

That's the gap I shipped today in Ajah.

The problem:

In a long agent run or multi-turn
conversation, a model can reverse its
position under social pressure — and
nothing flags it. Turn 2 says one thing.
Turn 8 says the opposite. Both responses
look perfectly normal in isolation.

For healthcare, legal, and financial
AI systems, this is a liability.

How narrative drift detection works:

Every session turn stores up to 2000
characters of response text in Redis
When a new request comes in with a
session ID, Ajah fetches the full
session history and passes it to
the scorer
The scorer extracts factual claims
from each turn — sentences containing
proper nouns, numbers, or absolute
statements
Claims are embedded using
sentence-transformers and compared
across turns using cosine similarity
High similarity + negation markers
= contradiction signal
drift_risk score + drift_verdict
(stable / possible_drift / drift_detected)
returned with every scored response
narrative_drift flag fires in the
Warnings dashboard when drift_risk > 0.5