Nagaraju Nampally

Posted on Jun 15 • Edited on Jun 16

The memory layer for your agent

#ai #agents #productivity #opensource

What started as a weekend side project over a beer turned into something much bigger than expected.

About a month ago, started building Memzent AI for fun — an intelligent semantic proxy that sits between AI agents and LLMs. The original idea was simple:

"Don't pay for the same answer twice."

Use semantic caching to recognize similar prompts and return previously generated responses instead of calling an expensive model again.

Then hit "the" problem.

Consider these two prompts:

→ Transfer $100 from account 123 to account 456
→ Transfer $100 from account 456 to account 123

A semantic search engine sees them as nearly identical.

A production system absolutely should not.

That realization led to what became the Evolution Pipeline — a series of deterministic safety and optimization layers that run before an LLM is ever called.

🔬 E1: Entity Extraction
Extracts critical entities with directional awareness (source vs destination, sender vs receiver, etc.).

⚡ E2: L1b Hot Path Cache
Entity-keyed lookups in Valkey for sub-millisecond response times without vector searches.

📊 E3: Offline Learning Plane
Asynchronous telemetry mining designed to be PII-safe and production-friendly.

🔄 E4: Workflow Registry
Automatically discovers recurring workflows and reusable execution patterns.

📈 E5: GPU Avoidance Rate
Our primary metric: how many requests are resolved without touching an LLM.

🧠 E6: Pattern Mining & Pre-Warming
Learns common request sequences and proactively prepares cache paths before they're needed.

The stack today:

Go Gateway
Rust gRPC Router
Qdrant Vector Engine
Valkey Cache
Next.js Dashboard

Built entirely in nights and weekends alongside a full-time job.

The more we worked on it, the more it evolved from "semantic caching" into something closer to an intelligent AI request router — one that understands when an expensive LLM call can be safely avoided.

It's fully open source and still evolving.

I'd genuinely love feedback from engineers working on AI infrastructure, agents, RAG systems, caching layers, gateways, or inference optimization.

What's over-engineered?
What's missing?
What would you build differently?

Star it. Break it. Roast it.

GitHub: GitHub
Docs: https://lnkd.in/gwfc8qXu
Website: https://memzent.ai
Intro blog : https://lnkd.in/gQrqVHRt

Great work Opsylux team - @maninampally Jagan MRP Madhuri Vilasagaram Manoj V Opsylux LLC, Memzent.AI

Top comments (4)

Alex Shev • Jun 15

Semantic caching is a good first version of agent memory because it starts with a measurable promise: avoid paying twice for equivalent work.

The next challenge is usually invalidation. If the repo, policy, or user preference changes, the memory layer has to know when a similar answer is no longer a safe answer.

Nagaraju Nampally • Jun 16 • Edited

Thank you for call out. Invalidation is the exact engineering hurdle that separates a naive wrapper from a robust production memory layer.

To give you some insight into how we are approaching this with Memzent, we attack the problem at a few different layers:

What’s live today:

Entity-Aware Cache Guard: We run an Evolution Pipeline to extract structural and directional entities prior to vector lookup. This stops the cache from conflating statements like "transfer $100 from A→B" and "transfer $100 from B→A", which baseline cosine similarity would typically match.
Tenant & Model Isolation: Cache keys are fully namespaced by organization, environment, and model to strictly enforce multi-tenant isolation.
TTL & Forced Clears: A default 1-hour rolling TTL catches natural decay, paired with pattern-based flushing for schema or config deployments.

What's on the immediate roadmap issues:

Event-Driven Invalidation via MCP: Leveraging the Model Context Protocol ecosystem to listen to tool-layer changes. If a connected repo gets a commit or a database schema migrates, those events will automatically target and bust related semantic entries.
Version-Tagged Cache Keys: Tying cache entries directly to a policy or system configuration version hash. A change to the policy doc immediately invalidates downstream cache keys without requiring manual database intervention.
Preference Drift Detection: Identifying real-time shifts in user context or state (e.g., switching tech stacks or roles) and proactively treating historically strong semantic matches as stale.

Our goal is to build a semantic layer that knows when it's wrong, not just when it's old. Really appreciate the high-signal comment—this is exactly where the interesting infrastructure work is happening right now. 🙌

Alex Shev • Jun 16

This is a strong direction. Entity-aware cache guards and version-tagged keys are the two pieces that make semantic memory feel less magical and more operational. The event-driven invalidation via MCP is especially interesting because stale memory usually comes from external state changing, not the embedding being bad.

Alex Shev • Jun 17

The invalidation point is huge. Memory is only useful if the system can tell when a stored preference is stale, contradicted, or too narrow for the current task.

I would rather have fewer memories with provenance and expiry than a giant pile of confident-sounding notes. Retrieval without a freshness check is where assistants start acting haunted by old context.