I Gave Our Enterprise AI a Memory. It Started Citing Last Quarter's Incidents.

#ai #architecture #llm #rag

The first time the system surfaced a past incident on its own—without being asked—I had to double-check that I hadn't hard-coded it. I hadn't. The Hindsight memory layer had retrieved it from a similarity search against our vector store, and the AI had cited it as supporting context in its recommendation. That's the moment I understood why stateless LLMs are the wrong default for operational systems.
Here's how we built persistent organizational memory into SentinelOps AI, and what it actually changes about how the system behaves.
Why LLM Statelessness Is an Operational Problem
Every LLM interaction, by default, starts from zero. The model doesn't remember that your team patched a critical auth vulnerability last November. It doesn't know that a particular vendor's SLA has been breached twice. It has no concept of your organization's evolving compliance posture.
For a consumer chatbot, this is fine. For an enterprise decision intelligence platform—one where operators are asking questions like "should we approve this third-party data processor?"—statelessness is a real failure mode. You end up relitigating the same decisions. You miss pattern recognition across incidents. Institutional knowledge lives in chat logs nobody reads.
The standard solution is to dump everything into a system prompt. This breaks down fast. Context windows have limits. More importantly, you don't want to inject everything—you want to inject relevant things. That's a retrieval problem, not a context-stuffing problem.
Hindsight: Semantic Retrieval for Operational Memory
We integrated the Hindsight memory system as the persistence layer for SentinelOps AI. The architecture is simple: critical decisions, incidents, and governance facts are extracted from AI interactions and embedded into a vector database via Hindsight. Future queries perform a similarity search against that store and inject the top-k results into the prompt as context.
Our memory service wraps the Hindsight client like this:
javascriptimport { HindsightClient } from '@vectorize-io/hindsight-client';

const hindsight = new HindsightClient({
url: process.env.HINDSIGHT_URL,
namespace: 'sentinelops-enterprise',
});

async function recallRelevantContext(query, topK = 5) {
const results = await hindsight.recall({
query,
topK,
filters: { namespace: 'sentinelops-enterprise' },
});

return results.map(r => ({
content: r.content,
similarity: r.score,
timestamp: r.metadata.timestamp,
incident_id: r.metadata.incident_id ?? null,
}));
}
Before any query hits the LLM, we call recallRelevantContext. The results—serialized as a structured block—get injected into the system prompt:
javascriptfunction buildSystemPrompt(recalledMemories) {
const memoryBlock = recalledMemories.length > 0
? ## Relevant Organizational History\n${ recalledMemories.map(m =>- [${m.timestamp}] ${m.content} (similarity: ${m.similarity.toFixed(2)})).join('\n') }
: '';

return `You are SentinelOps AI, an enterprise decision intelligence system.
${memoryBlock}

Respond only in the following JSON schema: { summary, risk_level, confidence, recommendation, tradeoffs, governance_flags, citations }`;
}
The LLM sees past incidents as first-class context. It can cite them. It can reason about patterns across them.
What Retention Looks Like
On the write side, after every significant interaction, we extract a memory-worthy summary and store it via Hindsight's retain API:
javascriptasync function retainDecision(interaction) {
const { query, response, metadata } = interaction;

// Only retain high-signal interactions
if (response.risk_level === 'LOW' && !response.governance_flags.length) return;

await hindsight.retain({
content: Decision: ${response.summary}. Risk: ${response.risk_level}. Recommendation: ${response.recommendation},
metadata: {
timestamp: new Date().toISOString(),
query_hash: hashQuery(query),
risk_level: response.risk_level,
incident_id: metadata.incident_id ?? null,
},
});
}
We intentionally skip retaining low-risk, flag-free interactions. The signal-to-noise ratio of your memory store matters. If you retain everything, retrieval quality degrades because every query pulls back a mix of critical incidents and routine lookups. Understanding what agent memory should and shouldn't store is the first architectural decision you need to make.
The Behavioral Change
Before memory, the system answered every query in isolation. After memory, its behavior changed in two visible ways.

It stopped repeating itself on recurring patterns. We had a recurring pattern of queries about a specific vendor's data residency configuration. Without memory, the AI gave the same baseline recommendation every time. With memory, by the third query on the same vendor, the system was surfacing its own prior recommendations and noting that the issue had been assessed twice before without resolution—which is a materially different, more useful answer.
It started making connections we hadn't explicitly made. A query about a new cloud provider's encryption configuration surfaced a three-month-old incident about inadequate key rotation on a different provider. The similarity wasn't lexical—both involved "encryption at rest" and "SOC2 scope." The semantic retrieval made a connection that full-text search wouldn't have caught. What Breaks Semantic memory is not free of failure modes. Stale context. If you retain a policy interpretation that later becomes outdated (a regulation changes, a vendor patches a known issue), that stale memory will surface and potentially mislead. We handle this with a TTL on certain memory categories and by including the timestamp in every memory injection so the LLM can reason about recency. Hallucinated citations. When you inject memory context and ask the model to cite it, some models will occasionally confabulate citation details—inventing an incident ID or timestamp that doesn't exist in the retrieved memory. We validate citations against the returned memory IDs before rendering them in the UI. Retrieval cold start. A fresh deployment has no memory. The system is strictly worse than a well-prompted zero-shot model until enough critical interactions have been retained. Plan for a seeding phase where you manually retain historical incidents and decisions. Lessons
Filter what you retain. Retaining every interaction pollutes the memory store. Only retain high-signal interactions—incidents, policy flags, governance decisions.
Include timestamps in every memory injection. The LLM needs to reason about recency. "This was assessed 18 months ago" is important context that pure semantic similarity doesn't capture.
Validate citations. Don't render AI-generated citations without checking them against the retrieved memory. The model will occasionally invent plausible-sounding references.
Memory is a product feature, not just infrastructure. The UX implication of memory is that users start to trust the system more—and hold it to a higher standard when it misses something. Set expectations clearly about what the system remembers and how long it retains it. The Hindsight documentation covers the retain/recall API in detail. If you're building any system where decisions compound over time—and operational systems always do—stateless LLMs are the wrong foundation. Memory isn't a feature; it's a prerequisite.

DEV Community

I Gave Our Enterprise AI a Memory. It Started Citing Last Quarter's Incidents.

Top comments (0)