DEV Community

kirandeepjassal-crypto
kirandeepjassal-crypto

Posted on • Originally published at prepstack.co.in

Context Engineering for Enterprise AI, Part 2: The Memory Layer That Makes Agents Useful

published on PrepStack.*

Your AI agent forgets everything the moment a request ends. That's not a model limitation — it's a missing memory layer. This is Part 2 of my Context Engineering series.

The reframe

The model is a stateless function. Memory is an enterprise database with embeddings bolted on — owned, scoped, audited, and forgettable. Not a chat transcript you keep pasting back into the prompt.

The architecture

Built across ASP.NET Core (system-of-record + governance on Azure SQL) and a Python FastAPI service (embeddings + semantic recall on Azure AI Search):

  1. Tiered memory — working → short-term (Redis + SQL) → long-term episodic + semantic (vector index)
  2. A salience-gated write policy — store what matters, not every turn (long-term writes cut ~85%)
  3. Retrieval blends similarity + recency, packed into the Part 1 token budget
  4. Tenant + user as hard query filters — never prompt instructions
  5. Right-to-be-forgotten that fans out to SQL + vectors + cache in under 5 minutes (GDPR Art. 17)

The results

Metric Outcome
Context tokens at turn 30 ~3,500 (vs ~14,000 history-stuffing)
Cost per query $0.021 → $0.008
Cross-tenant leaks (red-team) 0
Memory retrieval p95 74 ms
Cross-session continuity resumes the prior thread, no re-explaining

The model is stateless. Memory is infrastructure you own — scoped, audited, and forgettable.

Read the full breakdown — with all the C# and Python code — on PrepStack:
https://prepstack.co.in/blog/context-engineering-enterprise-genai-part-2-memory-layer

Top comments (0)