Joud Awad

Posted on Jun 2

27/30 Days System Design Questions!

#systemdesign #distributedsystems #llm #rag

Your LLM answers are wrong. Not hallucination-wrong — outdated-wrong.

You shipped a customer support bot on GPT-4. It's trained through early 2024. Your product changed 14 times since then. Every week, users get answers that were accurate 8 months ago and are flat-out wrong today.

The team is debating the fix.

Here's the setup:
NestJS API → OpenAI GPT-4 + PostgreSQL (product knowledge base)
~2,000 support queries/day, 15% return wrong answers tied to stale knowledge
Knowledge base updates weekly — new pricing, new features, deprecated flows
Budget: mid-size startup, not training custom models from scratch
You need accurate, up-to-date answers without re-training on every product update

What do you do?

A) RAG — embed your knowledge base, retrieve relevant chunks at query time, inject into context. Model stays the same, knowledge is always fresh.
B) Fine-tune the base model — train GPT-4 (or open-source equivalent) on your product docs. The model internalizes your domain.
C) Fine-tune + RAG hybrid — fine-tune for style/tone/domain fluency, RAG for factual grounding. Best of both worlds.
D) Prompt engineering only — detailed system prompt + few-shot examples. No infra, no training, just better instructions.

All four are in production somewhere. Only one actually solves the problem in front of you.

Pick one — A, B, C, or D — and tell me why. I'll drop the full breakdown in the comments (including the one that feels like the obvious upgrade but makes your freshness problem worse, not better).

If your team is arguing RAG vs fine-tuning right now, share this. The tradeoff is worth mapping before you commit.

Drop your answer 👇

30DaysOfSystemDesign #SystemDesign #RAG #MachineLearning

Top comments (4)

Joud Awad • Jun 2

B — Fine-tuning (the trap answer)

Feels like the obvious upgrade. Makes the problem worse.

Fine-tuning bakes knowledge into weights. Those weights are static until you retrain. Weekly product updates = weekly retraining at real cost (time + money) + real risk of catastrophic forgetting (new training can degrade recall of old content if not carefully managed).

Fine-tuning is the right answer for behavioral changes — tone, format, domain fluency, internal jargon. It teaches the model how to behave, not what to know. Wrong tool for a freshness problem.

Joud Awad • Jun 2 • Edited

Answer: A — RAG

The problem is freshness — the model doesn't know what changed, not that it doesn't understand your domain.

RAG separates knowledge from reasoning. The model stays frozen (no retraining cost, no deployment cycle). Your knowledge base lives in a vector store — update a doc, re-embed it, done. New pricing ships Monday, your bot knows it by Monday.

At runtime: user asks → embed query → retrieve top-K relevant chunks → inject into context → model reasons over fresh, accurate content.

You've turned a model problem into a data pipeline problem. Every engineering team already knows how to solve data pipelines.

Joud Awad • Jun 2

C — Fine-tune + RAG hybrid (overkill)

Genuinely powerful architecture — but it's step 3, not step 1.

You can't productively fine-tune for style until your factual grounding is solid first. The hybrid is what you reach for after you've shipped RAG cleanly and hit the ceiling of what retrieval alone can do.

For a startup with a 15% wrong-answer rate due to stale knowledge, shipping a hybrid is a 3-month project when a 2-week RAG implementation fixes the actual problem.

Joud Awad • Jun 2

D — Prompt engineering only (hits the ceiling fast)

Fastest to ship, first to break.

A detailed system prompt can set tone and format. It cannot inject 400 pages of updated product docs without hitting token limits — and manually picking which docs to include is just RAG without the retrieval infrastructure.

Prompt engineering works when the model already knows what it needs to know. It can't fix a knowledge gap.