Lei Ye

Posted on Mar 5 • Edited on Mar 10 • Originally published at lei-ye.dev

Introducing Maester

#ai #saas #architecture #machinelearning

The Knowledge Engine of Your Company

Most companies today want the same thing from AI: Turn their internal knowledge into something queryable, explainable, and operational.

In practice this means:

Documents scattered across tools
Institutional knowledge trapped in teams
Data that exists but cannot be used

And the typical solution becomes:

“Let’s build an AI assistant.”

But building an AI demo and building AI infrastructure that survives production are very different things.

Maester is our attempt to build the latter.

What Maester Is

Maester is a reference implementation of a B2B SaaS AI knowledge engine. It demonstrates how a company can transform internal data into a production-grade knowledge system.

At its core, Maester allows organizations to:

ingest internal documents
structure and embed them
retrieve relevant knowledge
generate responses with citations
trace every operation across the system

But more importantly, Maester is designed as infrastructure, not just an AI feature. That means we are focusing on:

reliability
traceability
operational cost control
asynchronous pipelines
multi-tenant architecture

We are building this project in public, both as a working system and as a learning artifact. Every design choice will be documented. Every architecture decision will be explained.

This blog serves as a system design journal

The Infrastructure Problem Most AI SaaS Products Ignore

When teams first add AI to a product, the initial version often works. A prototype connects an LLM, retrieves some documents, and produces answers. But once the system meets real users, things break quickly. We repeatedly see the same failure modes:

1. Timeouts

LLM calls are slow and unpredictable. Without proper timeouts and retries, requests cascade into system failures.

2. Uncontrolled costs

Every query triggers embedding calls, retrieval operations, and model inference. Without cost tracking and guardrails, usage grows faster than expected.

3. Queues and ingestion pipelines

Document ingestion is not instantaneous. Parsing, chunking, and embedding require asynchronous pipelines that many systems lack.

4. Traceability gaps

When something goes wrong, teams often cannot answer simple questions:

What document generated this answer?
Which embedding version was used?
Which model responded?

Without observability, AI becomes a black box in production.

But What “Production-ready AI Infrastructure” Actually Means

For us, production readiness is not about model quality. It is about system design. A production AI SaaS system must provide:

1. Asynchronous ingestion pipelines

Documents must move through structured stages:
parse → chunk → embed → index.
Each stage should be observable and retryable.

2. Reliable model access

All LLM access must go through a gateway that manages:

timeouts
retries
provider fallback

3. Usage and cost accounting

Every request must produce a usage record. Production systems must answer:

Which tenant generated this cost?
Which model generated this response?

4. Traceability

Requests must carry a correlation ID through:

API layer
worker queues
model calls

This is how production systems become debuggable.

How Maester is Structured

Instead of treating AI as a feature, we treat it as infrastructure. Maester separates the system into clear operational layers.

Architectural Layers

API Layer

Handles request entry, tenant routing, and request validation. This layer also generates request IDs used for tracing.

Knowledge Engine Core

This is where Maester’s core logic lives. Responsibilities include:

document retrieval
query orchestration
interaction with the model gateway
enforcing cost budgets

Async Worker System

All heavy processing moves to asynchronous workers:

document parsing
chunking
embedding
vector indexing

This prevents ingestion tasks from blocking user requests.

Model Gateway

Instead of calling models directly, all inference flows through a gateway. This gateway manages:

provider abstraction
retry logic
token usage tracking
future fallback support

Observability Layer

We treat observability as a first-class concern. Every request is traceable across:

API requests
worker jobs
model calls

This allows production debugging without guesswork.

Closing

Maester is not just an AI application.

It is an exploration of how AI systems should be engineered. In the coming posts, we will document:

architecture decisions
reliability patterns
cost control strategies
production ML infrastructure design

Our goal is simple: To build a knowledge engine that companies can trust in production. And to make every engineering decision transparent and explainable.

The system starts SMALL.

But the architecture is designed to SCALE.

Originally published on my engineering blog: https://lei-ye.dev/blog/introducing-maester

DEV Community