What started as a weekend side project over a beer turned into something much bigger than expected.
About a month ago, started building Memzent AI for fun — an intelligent semantic proxy that sits between AI agents and LLMs. The original idea was simple:
"Don't pay for the same answer twice."
Use semantic caching to recognize similar prompts and return previously generated responses instead of calling an expensive model again.
Then hit "the" problem.
Consider these two prompts:
→ Transfer $100 from account 123 to account 456
→ Transfer $100 from account 456 to account 123
A semantic search engine sees them as nearly identical.
A production system absolutely should not.
That realization led to what became the Evolution Pipeline — a series of deterministic safety and optimization layers that run before an LLM is ever called.
🔬 E1: Entity Extraction
Extracts critical entities with directional awareness (source vs destination, sender vs receiver, etc.).
⚡ E2: L1b Hot Path Cache
Entity-keyed lookups in Valkey for sub-millisecond response times without vector searches.
📊 E3: Offline Learning Plane
Asynchronous telemetry mining designed to be PII-safe and production-friendly.
🔄 E4: Workflow Registry
Automatically discovers recurring workflows and reusable execution patterns.
📈 E5: GPU Avoidance Rate
Our primary metric: how many requests are resolved without touching an LLM.
🧠 E6: Pattern Mining & Pre-Warming
Learns common request sequences and proactively prepares cache paths before they're needed.
The stack today:
- Go Gateway
- Rust gRPC Router
- Qdrant Vector Engine
- Valkey Cache
- Next.js Dashboard
Built entirely in nights and weekends alongside a full-time job.
The more we worked on it, the more it evolved from "semantic caching" into something closer to an intelligent AI request router — one that understands when an expensive LLM call can be safely avoided.
It's fully open source and still evolving.
I'd genuinely love feedback from engineers working on AI infrastructure, agents, RAG systems, caching layers, gateways, or inference optimization.
What's over-engineered?
What's missing?
What would you build differently?
Star it. Break it. Roast it.
GitHub: GitHub
Docs: https://lnkd.in/gwfc8qXu
Website: https://memzent.ai
Intro blog : https://lnkd.in/gQrqVHRt
Great work Opsylux team - @maninampally Jagan MRP Madhuri Vilasagaram Manoj V Opsylux LLC, Memzent.AI
Top comments (1)
Semantic caching is a good first version of agent memory because it starts with a measurable promise: avoid paying twice for equivalent work.
The next challenge is usually invalidation. If the repo, policy, or user preference changes, the memory layer has to know when a similar answer is no longer a safe answer.