Jane Alesi

Posted on Mar 7 • Edited on Apr 13

Building a Sovereign AI Stack: From Zero to POC

#ai #ecommerce #3dprinting #webgl

In an era where data privacy is paramount, relying on cloud-based AI providers isn't always an option. Whether for compliance, security, or just peace of mind, running a Sovereign AI Stack—a completely local, self-controlled AI infrastructure—is the ultimate goal for many organizations.

Today, we built a Proof of Concept (POC) for such a stack, leveraging open-source tools to create a private, observable, and searchable AI environment. Here is our journey.

The Architecture

Our stack consists of three core components, orchestrated by a Node.js application:

AI Server: A local LLM running on llama.cpp (serving OpenAI-compatible API). This provides the intelligence without data leaving the network.
Search Engine: Manticore Search (running in Docker). We chose Manticore for its lightweight footprint and powerful full-text search capabilities, essential for RAG (Retrieval-Augmented Generation).
Observability: AI Observer (running in Docker). You can't manage what you can't measure. This tool captures traces and metrics of our AI interactions.

The Architecture Visualized

┌─────────────────┐        ┌──────────────────┐
│                 │──(1)──▶│ Manticore Search │
│  Orchestrator   │        │     (Docker)     │
│    (Node.js)    │        └──────────────────┘
│                 │        ┌──────────────────┐
│                 │──(2)──▶│  AI Server LLM   │
│                 │        │  (192.168.0.2)   │
│                 │        └──────────────────┘
│                 │        ┌──────────────────┐
│                 │──(3)──▶│   AI Observer    │
└─────────────────┘        │     (Docker)     │
                           └──────────────────┘
                                     │
                                    (4)
                                     ▼
                           (Monitors AI Server)

Component State Flow

[*] ──▶ Init ──▶ Indexing: Create Table (RT)
                    │
                    ▼
              Searching: Documents Added
              /                     \
             /                       \
   Error: No Hits (Retry)      RAG_Construction: Hits Found
           │                              │
          [*]                             ▼
                              Inference: Context + Prompt
                              /                     \
                             /                       \
             Timeout: Model Slow            Success: Answer Generated
                     │                               │
                    [*]                             [*]

The Implementation

1. Setting the Foundation (Docker)

We containerized Manticore and AI Observer using docker-compose. One interaction challenge was networking: ensuring our orchestrator (client) could talk to the containers AND the external AI server. Mapping ports (9308, 9312, 3001) was crucial.

Lesson detailed: Manticore's SQL interface over HTTP (/sql) is powerful but slightly different from the JSON-only /search endpoint typically used by some clients. We had to adapt our client to parse the SQL response structure properly.

2. The Orchestrator

We built a simple TypeScript orchestrator that mimics a real-world application flow:

Ingest: Index sovereign data into Manticore.
Retrieve: Search Manticore for relevant context (MATCH('Ensures data privacy')).
Augment: Combine the retrieved context with a user prompt.
Generate: Send the augmented prompt to the local LLM.
Observe: Log every step to AI Observer.

3. Verification & Testing

We didn't just build it; we proved it works.

Integration Tests: Using vitest, we verified that documents are indexed correctly and retrievable (fixing a zero-hit issue by understanding RT index flushing).
End-to-End: The full pipeline generated a coherent explanation of "Sovereign AI" using our local setup.
Visual Validation: We verified the AI Observer UI via browser automation to ensure telemetry was landing.

Real-World Experience

The most striking realization was the latency trade-off. Our local LLM took ~18-80 seconds for a comprehensive answer. While slower than cloud APIs, the trade-off buys you total privacy. No token costs, no data leaks.

Manticore proved to be incredibly fast for retrieval, often returning hits in milliseconds, making it a perfect companion for the slower LLM.

Conclusion & What's Next

This POC proves that a Sovereign AI Stack is not only possible but accessible. With tools like Manticore and AI Observer, you can build a robust, private RAG pipeline in an afternoon.

What's Next:

Implement a persistent vector store for semantic search.
Optimize LLM inference speed (quantization, GPU offloading).
Build a chat UI on top of the orchestrator.

Jane Alesi

Managing Director at satware AG | AI Architect | Advocate for GDPR-compliant Sovereign AI

🔗 LinkedIn | GitHub | satware® AI

DEV Community