title: "A-Modular-Kingdom - The Infrastructure Layer AI Agents Deserve"
published: true
description: "Production-ready MCP server with RAG, memory, and tools. Connect any AI agent to long-term memory, document retrieval, and 10+ powerful tools."
tags: ai, rag, mcp, python
canonical_url: https://masihmoafi.com/blog/a-modular-kingdom
A-Modular-Kingdom
The infrastructure layer AI agents deserve
Why I Built This
Every AI agent I built had the same problem: I kept rebuilding the same infrastructure from scratch.
- RAG system? Build it again.
- Long-term memory? Implement it again.
- Web search, code execution, vision? Wire them up again.
After the third project, I stopped. I extracted everything into a single, production-ready foundation that any agent can plug into via the Model Context Protocol (MCP).
A-Modular-Kingdom is that foundation.
What It Does
Start the MCP server:
python src/agent/host.py
Now any AI agent—Claude Desktop, Gemini, custom chatbots—instantly gets:
- Document retrieval (RAG) with Qdrant + BM25 + CrossEncoder reranking
- Hierarchical memory that persists across sessions and projects
- 10+ tools: web search, browser automation, code execution, vision, TTS/STT
One server. Unlimited applications.
The Tools
| Tool | What It Does |
|---|---|
query_knowledge_base |
Search documents with hybrid retrieval (vector + keyword + reranking) |
save_memory |
Store memories with automatic scope inference |
search_memories |
Retrieve with priority: global rules → preferences → project context |
save_fact |
Structured fact storage with metadata |
set_global_rule |
Persistent instructions across all sessions |
list_all_memories |
View everything stored |
delete_memory |
Remove by ID |
web_search |
DuckDuckGo integration |
browser_automation |
Playwright scraping (text + screenshots) |
code_execute |
Safe Python sandbox |
analyze_media |
Ollama vision for images/videos |
text_to_speech |
Multiple engines (pyttsx3, gtts, kokoro) |
speech_to_text |
Whisper transcription |
RAG: Not Just Vector Search
Most RAG implementations are naive: embed documents, find nearest neighbors, return results. This works for demos. It fails in production.
A-Modular-Kingdom uses a three-stage pipeline:

Anthropic's Contextual Retrieval - the inspiration for this RAG implementation.
Stage 1: Hybrid Retrieval
- Vector search (Qdrant Cloud) finds semantically similar chunks
- BM25 keyword search catches exact term matches vectors miss
Stage 2: Ensemble Fusion
- Results from both methods are combined with configurable weights
- Neither method dominates—they complement each other
Stage 3: CrossEncoder Reranking
- A cross-encoder model (ms-marco-MiniLM-L-6-v2) scores each result against the query
- Top 5 most relevant results are returned

V3 RAG Architecture - Hybrid retrieval with RRF fusion and CrossEncoder reranking.
The Numbers
Accuracy:
- Focused FAQ: 100%
- Real documents: 83-86%
- LLM-as-Judge: 84-98%
Performance:
- V2: 26.8s cold start, 0.31s warm query
- V3: 13.9s cold start, 0.02s warm query
Supports: Python, Markdown, PDF, Jupyter notebooks, JavaScript, TypeScript

Anthropic's evaluation showing contextual retrieval improvements - benchmark reference for our implementation.
Memory: Scoped and Hierarchical

Memory architecture inspired by Mem0 - hierarchical, scoped, and persistent.
Flat memory systems don't scale. When you have hundreds of memories, search becomes noise.
A-Modular-Kingdom organizes memory into scopes:
| Scope | Persistence | Example |
|---|---|---|
global_rules |
Forever, all projects | "Always use type hints" |
global_preferences |
Forever, all projects | "Prefer concise responses" |
global_personas |
Forever, all projects | Reusable agent personalities |
project_context |
Current project only | "Uses FastAPI backend" |
Smart Inference
You don't need to specify scopes manually. The system infers from content:
save_memory("User prefers dark mode") # → global_preferences
save_memory("Always validate input") # → global_rules
save_memory("Uses PostgreSQL") # → project_context
Priority Search
When you search, results come back in priority order:
- Global rules (highest priority)
- Global preferences
- Global personas
- Project context
This means your persistent instructions always surface first.
Integration
Claude Desktop
Add to claude_desktop_config.json:
{
"mcpServers": {
"a-modular-kingdom": {
"command": "python",
"args": ["/path/to/src/agent/host.py"]
}
}
}
Custom Agents
from smolagents import ToolCallingAgent, ToolCollection
from mcp import StdioServerParameters
params = StdioServerParameters(
command="python",
args=["/path/to/host.py"]
)
with ToolCollection.from_mcp(params) as tools:
agent = ToolCallingAgent(tools=list(tools.tools))
result = agent.run("Search the codebase for auth logic")
Standalone Package
Don't need the full server? Install just the RAG and memory components:
pip install rag-mem
from memory_mcp import RAGPipeline, MemoryStore
# RAG
pipeline = RAGPipeline(document_paths=["./docs"])
pipeline.index()
results = pipeline.search("How does auth work?")
# Memory
memory = MemoryStore()
memory.add("Important fact")
results = memory.search("facts")
CLI
memory-mcp init
memory-mcp serve --docs ./documents
memory-mcp index ./path/to/files
Technical Stack
- Embeddings: Pluggable providers—Ollama, sentence-transformers, or OpenAI
- Vector DB: Qdrant (local or cloud)
- Keyword Search: BM25 (rank-bm25)
- Reranking: CrossEncoder (ms-marco-MiniLM-L-6-v2)
- Memory: Qdrant with hierarchical scoping
- Protocol: Model Context Protocol (MCP)
Real-World Application: Google Hackathon
The modularity of A-Modular-Kingdom was demonstrated in my Google Kaggle Hackathon submission—a multi-agent emotional AI system built on Gemma 3n.
Multi-agent architecture using A-Modular-Kingdom's RAG and Memory modules.
The system uses a modular pipeline: Vocal Emotion Detection analyzes speech while Gemma 3n's vision assesses facial expressions. The combined emotion tag and transcribed query are passed to a Router Agent that delegates to specialist sub-agents—each backed by A-Modular-Kingdom's RAG and Memory module for personalized, context-aware responses.
Modules Used
- RAG: Each sub-agent retrieves relevant context from persistent knowledge bases
- Memory: Long-term storage of user preferences, conversation history, and learned behaviors
- Browser Automation: Playwright MCP tool for web interactions
Links
A-Modular-Kingdom: Stop rebuilding. Start building.



Top comments (0)