DEV Community

Jamie Cole
Jamie Cole

Posted on

The 5 LLM Architecture Patterns That Scale (And 2 That Do Not)

After building LLM features for 18 months, here are the architecture patterns I have seen work at scale. And the two that consistently fail.

Patterns That Scale

1. Prompt-as-a-Service

User Input → Prompt Template → LLM API → Response → User
Enter fullscreen mode Exit fullscreen mode

Simple. Reliable. Easy to debug. Most LLM features should start here.

2. Retrieval-Augmented Generation (RAG)

Query → Vector Search → Context → Prompt → LLM → Response
Enter fullscreen mode Exit fullscreen mode

Good for question answering, knowledge bases, anything requiring specific information.

3. Agentic Workflows

Task → LLM Planning → Tool Calls → Review → Output
Enter fullscreen mode Exit fullscreen mode

For complex tasks requiring multiple steps. More powerful but harder to debug.

4. Caching Layer

Input → Cache Check → [HIT] → Response
                   → [MISS] → LLM → Cache → Response
Enter fullscreen mode Exit fullscreen mode

Reduces cost and latency for repeated queries. Essential at scale.

5. Human-in-the-Loop

LLM Output → Human Review → [APPROVE] → Output
                          → [REJECT] → Retry
Enter fullscreen mode Exit fullscreen mode

For high-stakes decisions. Expensive but necessary for compliance.

Patterns That Do Not Scale

1. Direct Database → LLM → Output

User → LLM → Database Write → Response
Enter fullscreen mode Exit fullscreen mode

No validation. No review. Logs, outputs, and destroys data. Works at demo scale. Breaks at production.

2. Monolithic Prompt Engineering

Complex Prompt = System + Context + History + Constraints + Examples + ...
Enter fullscreen mode Exit fullscreen mode

A 2000-token prompt that does everything. Impossible to test, debug, or version control.

The Key Insight

LLM architecture is software architecture. The same principles apply: modularity, testing, versioning, observability.

If your LLM feature would fail a code review for a microservice, it will fail in production.


Building scalable LLM features? I write about what works in production. Follow along.

Top comments (0)