DEV Community

Cover image for Introducing Maester
Lei Ye
Lei Ye

Posted on • Originally published at lei-ye.dev

Introducing Maester

The Knowledge Engine of Your Company

Most companies today want the same thing from AI: Turn their internal knowledge into something queryable, explainable, and operational.

In practice this means:

  • Documents scattered across tools
  • Institutional knowledge trapped in teams
  • Data that exists but cannot be used

And the typical solution becomes:

“Let’s build an AI assistant.”

But building an AI demo and building AI infrastructure that survives production are very different things.

Maester is our attempt to build the latter.


What Maester Is

Maester is a reference implementation of a B2B SaaS AI knowledge engine. It demonstrates how a company can transform internal data into a production-grade knowledge system.

At its core, Maester allows organizations to:

  • ingest internal documents
  • structure and embed them
  • retrieve relevant knowledge
  • generate responses with citations
  • trace every operation across the system

But more importantly, Maester is designed as infrastructure, not just an AI feature. That means we are focusing on:

  • reliability
  • traceability
  • operational cost control
  • asynchronous pipelines
  • multi-tenant architecture

We are building this project in public, both as a working system and as a learning artifact. Every design choice will be documented. Every architecture decision will be explained.

This blog serves as a system design journal


The Infrastructure Problem Most AI SaaS Products Ignore

When teams first add AI to a product, the initial version often works. A prototype connects an LLM, retrieves some documents, and produces answers. But once the system meets real users, things break quickly. We repeatedly see the same failure modes:

1. Timeouts

LLM calls are slow and unpredictable. Without proper timeouts and retries, requests cascade into system failures.

2. Uncontrolled costs

Every query triggers embedding calls, retrieval operations, and model inference. Without cost tracking and guardrails, usage grows faster than expected.

3. Queues and ingestion pipelines

Document ingestion is not instantaneous. Parsing, chunking, and embedding require asynchronous pipelines that many systems lack.

4. Traceability gaps

When something goes wrong, teams often cannot answer simple questions:

  • What document generated this answer?
  • Which embedding version was used?
  • Which model responded?

Without observability, AI becomes a black box in production.


But What “Production-ready AI Infrastructure” Actually Means

For us, production readiness is not about model quality. It is about system design. A production AI SaaS system must provide:

1. Asynchronous ingestion pipelines

Documents must move through structured stages:
parse → chunk → embed → index.
Each stage should be observable and retryable.

2. Reliable model access

All LLM access must go through a gateway that manages:

  • timeouts
  • retries
  • provider fallback

3. Usage and cost accounting

Every request must produce a usage record. Production systems must answer:

  • Which tenant generated this cost?
  • Which model generated this response?

4. Traceability

Requests must carry a correlation ID through:

  • API layer
  • worker queues
  • model calls

This is how production systems become debuggable.


How Maester is Structured

Instead of treating AI as a feature, we treat it as infrastructure. Maester separates the system into clear operational layers.

Architecture Design

Architectural Layers

API Layer

Handles request entry, tenant routing, and request validation. This layer also generates request IDs used for tracing.

Knowledge Engine Core

This is where Maester’s core logic lives. Responsibilities include:

  • document retrieval
  • query orchestration
  • interaction with the model gateway
  • enforcing cost budgets

Async Worker System

All heavy processing moves to asynchronous workers:

  • document parsing
  • chunking
  • embedding
  • vector indexing

This prevents ingestion tasks from blocking user requests.

Model Gateway

Instead of calling models directly, all inference flows through a gateway. This gateway manages:

  • provider abstraction
  • retry logic
  • token usage tracking
  • future fallback support

Observability Layer

We treat observability as a first-class concern. Every request is traceable across:

  • API requests
  • worker jobs
  • model calls

This allows production debugging without guesswork.


Closing

Maester is not just an AI application.

It is an exploration of how AI systems should be engineered. In the coming posts, we will document:

  • architecture decisions
  • reliability patterns
  • cost control strategies
  • production ML infrastructure design

Our goal is simple: To build a knowledge engine that companies can trust in production. And to make every engineering decision transparent and explainable.

The system starts SMALL.

But the architecture is designed to SCALE.


Originally published on my engineering blog: https://lei-ye.dev/blog/introducing-maester

Top comments (0)