In an era where data privacy is paramount, relying on cloud-based AI providers isn't always an option. Whether for compliance, security, or just peace of mind, running a Sovereign AI Stack—a completely local, self-controlled AI infrastructure—is the ultimate goal for many organizations.
Today, we built a Proof of Concept (POC) for such a stack, leveraging open-source tools to create a private, observable, and searchable AI environment. Here is our journey.
The Architecture
Our stack consists of three core components, orchestrated by a Node.js application:
- AI Server: A local LLM running on
llama.cpp(serving OpenAI-compatible API). This provides the intelligence without data leaving the network. - Search Engine: Manticore Search (running in Docker). We chose Manticore for its lightweight footprint and powerful full-text search capabilities, essential for RAG (Retrieval-Augmented Generation).
- Observability: AI Observer (running in Docker). You can't manage what you can't measure. This tool captures traces and metrics of our AI interactions.
The Architecture Visualized
┌─────────────────┐ ┌──────────────────┐
│ │──(1)──▶│ Manticore Search │
│ Orchestrator │ │ (Docker) │
│ (Node.js) │ └──────────────────┘
│ │ ┌──────────────────┐
│ │──(2)──▶│ AI Server LLM │
│ │ │ (192.168.0.2) │
│ │ └──────────────────┘
│ │ ┌──────────────────┐
│ │──(3)──▶│ AI Observer │
└─────────────────┘ │ (Docker) │
└──────────────────┘
│
(4)
▼
(Monitors AI Server)
Component State Flow
[*] ──▶ Init ──▶ Indexing: Create Table (RT)
│
▼
Searching: Documents Added
/ \
/ \
Error: No Hits (Retry) RAG_Construction: Hits Found
│ │
[*] ▼
Inference: Context + Prompt
/ \
/ \
Timeout: Model Slow Success: Answer Generated
│ │
[*] [*]
The Implementation
1. Setting the Foundation (Docker)
We containerized Manticore and AI Observer using docker-compose. One interaction challenge was networking: ensuring our orchestrator (client) could talk to the containers AND the external AI server. Mapping ports (9308, 9312, 3001) was crucial.
Lesson detailed: Manticore's SQL interface over HTTP (/sql) is powerful but slightly different from the JSON-only /search endpoint typically used by some clients. We had to adapt our client to parse the SQL response structure properly.
2. The Orchestrator
We built a simple TypeScript orchestrator that mimics a real-world application flow:
- Ingest: Index sovereign data into Manticore.
- Retrieve: Search Manticore for relevant context (
MATCH('Ensures data privacy')). - Augment: Combine the retrieved context with a user prompt.
- Generate: Send the augmented prompt to the local LLM.
- Observe: Log every step to AI Observer.
3. Verification & Testing
We didn't just build it; we proved it works.
-
Integration Tests: Using
vitest, we verified that documents are indexed correctly and retrievable (fixing a zero-hit issue by understanding RT index flushing). - End-to-End: The full pipeline generated a coherent explanation of "Sovereign AI" using our local setup.
- Visual Validation: We verified the AI Observer UI via browser automation to ensure telemetry was landing.
Real-World Experience
The most striking realization was the latency trade-off. Our local LLM took ~18-80 seconds for a comprehensive answer. While slower than cloud APIs, the trade-off buys you total privacy. No token costs, no data leaks.
Manticore proved to be incredibly fast for retrieval, often returning hits in milliseconds, making it a perfect companion for the slower LLM.
Conclusion & What's Next
This POC proves that a Sovereign AI Stack is not only possible but accessible. With tools like Manticore and AI Observer, you can build a robust, private RAG pipeline in an afternoon.
What's Next:
- Implement a persistent vector store for semantic search.
- Optimize LLM inference speed (quantization, GPU offloading).
- Build a chat UI on top of the orchestrator.
Jane Alesi
Managing Director at satware AG | AI Architect | Advocate for GDPR-compliant Sovereign AI
🔗 LinkedIn | GitHub | satware® AI
Top comments (0)