The Knowledge Engine of Your Company
Most companies today want the same thing from AI: Turn their internal knowledge into something queryable, explainable, and operational.
In practice this means:
- Documents scattered across tools
- Institutional knowledge trapped in teams
- Data that exists but cannot be used
And the typical solution becomes:
“Let’s build an AI assistant.”
But building an AI demo and building AI infrastructure that survives production are very different things.
Maester is our attempt to build the latter.
What Maester Is
Maester is a reference implementation of a B2B SaaS AI knowledge engine. It demonstrates how a company can transform internal data into a production-grade knowledge system.
At its core, Maester allows organizations to:
- ingest internal documents
- structure and embed them
- retrieve relevant knowledge
- generate responses with citations
- trace every operation across the system
But more importantly, Maester is designed as infrastructure, not just an AI feature. That means we are focusing on:
- reliability
- traceability
- operational cost control
- asynchronous pipelines
- multi-tenant architecture
We are building this project in public, both as a working system and as a learning artifact. Every design choice will be documented. Every architecture decision will be explained.
This blog serves as a system design journal
The Infrastructure Problem Most AI SaaS Products Ignore
When teams first add AI to a product, the initial version often works. A prototype connects an LLM, retrieves some documents, and produces answers. But once the system meets real users, things break quickly. We repeatedly see the same failure modes:
1. Timeouts
LLM calls are slow and unpredictable. Without proper timeouts and retries, requests cascade into system failures.
2. Uncontrolled costs
Every query triggers embedding calls, retrieval operations, and model inference. Without cost tracking and guardrails, usage grows faster than expected.
3. Queues and ingestion pipelines
Document ingestion is not instantaneous. Parsing, chunking, and embedding require asynchronous pipelines that many systems lack.
4. Traceability gaps
When something goes wrong, teams often cannot answer simple questions:
- What document generated this answer?
- Which embedding version was used?
- Which model responded?
Without observability, AI becomes a black box in production.
But What “Production-ready AI Infrastructure” Actually Means
For us, production readiness is not about model quality. It is about system design. A production AI SaaS system must provide:
1. Asynchronous ingestion pipelines
Documents must move through structured stages:
parse → chunk → embed → index.
Each stage should be observable and retryable.
2. Reliable model access
All LLM access must go through a gateway that manages:
- timeouts
- retries
- provider fallback
3. Usage and cost accounting
Every request must produce a usage record. Production systems must answer:
- Which tenant generated this cost?
- Which model generated this response?
4. Traceability
Requests must carry a correlation ID through:
- API layer
- worker queues
- model calls
This is how production systems become debuggable.
How Maester is Structured
Instead of treating AI as a feature, we treat it as infrastructure. Maester separates the system into clear operational layers.
Architectural Layers
API Layer
Handles request entry, tenant routing, and request validation. This layer also generates request IDs used for tracing.
Knowledge Engine Core
This is where Maester’s core logic lives. Responsibilities include:
- document retrieval
- query orchestration
- interaction with the model gateway
- enforcing cost budgets
Async Worker System
All heavy processing moves to asynchronous workers:
- document parsing
- chunking
- embedding
- vector indexing
This prevents ingestion tasks from blocking user requests.
Model Gateway
Instead of calling models directly, all inference flows through a gateway. This gateway manages:
- provider abstraction
- retry logic
- token usage tracking
- future fallback support
Observability Layer
We treat observability as a first-class concern. Every request is traceable across:
- API requests
- worker jobs
- model calls
This allows production debugging without guesswork.
Closing
Maester is not just an AI application.
It is an exploration of how AI systems should be engineered. In the coming posts, we will document:
- architecture decisions
- reliability patterns
- cost control strategies
- production ML infrastructure design
Our goal is simple: To build a knowledge engine that companies can trust in production. And to make every engineering decision transparent and explainable.
The system starts SMALL.
But the architecture is designed to SCALE.
Originally published on my engineering blog: https://lei-ye.dev/blog/introducing-maester

Top comments (0)