DtoTHEmoon

Posted on Jun 1

RAG vs Agent: The Decision That Broke My System (And How I Now Enforce It Upfront)

#ai #programming #claude #security

Most people treat the RAG-vs-Agent question as a technical preference. Pick whichever feels right, adjust later.

I did that. It cost me two full rebuilds.

Here's the decision framework I've landed on — and the tool I built to enforce it before the first line of code gets written.

The Mistake: Treating Architecture as Reversible

I was building GrowthOS, a four-module internal talent development platform. When I hit module three — personalized learning path generation — I reached for RAG out of habit. I'd just built a solid RAG knowledge base in module one. The pattern was familiar.

Six days in, I had a retrieval system that could surface relevant learning materials. What it couldn't do:

Read an employee's current skill profile
Analyze which specific gaps needed closing
Decide the optimal sequencing given available time
Monitor whether the employee's behavior changed after completing a path
Trigger re-planning when skills shifted

RAG returned documents. The task required decisions across time. I had picked the wrong primitive, and the cost was a rebuild.

The deeper problem: I had no forcing function that made me answer the architecture question before building.

The Decision Framework

After rebuilding twice, I reduced the RAG-vs-Agent decision to three diagnostic questions:

Question 1: Is this a retrieval task or an execution task?

RAG is fundamentally a retrieval primitive: given a query, find and synthesize relevant content. It's excellent when the output is information.

Agent is an execution primitive: given a goal, take a sequence of actions using tools. It's necessary when the output is a decision or a state change.

The confusion happens because modern RAG pipelines can feel agentic — they chunk, embed, retrieve, rerank, generate. But all of that complexity is still in service of answering a question, not executing a workflow.

Question 2: Does the task require maintaining state across multiple steps?

If yes, you need Agent.

RAG is stateless by design. Each query is independent. You can build workarounds — storing context, chaining queries — but you're fighting the architecture.

Agent is stateful by design. It maintains context, tracks intermediate results, and can loop back based on what it finds.

For GrowthOS module three, the path generation workflow looked like this:
read_profile(employee_id)
→ analyze_skill_gap(profile, target_role)
→ search_materials(gap_list)
→ generate_path(gaps, materials, available_time)
→ monitor_progress(employee_id, path) ← runs continuously
→ trigger_replan(if behavior_signal_detected)

Each arrow is a tool call that depends on the result of the previous one. This is Agent territory, not RAG.

Question 3: What is the cost of getting this wrong?

RAG failure modes are usually visible and recoverable: the answer is wrong or incomplete, the user notices, you fix the retrieval. Time cost, not catastrophic.

Agent failure modes can be silent and compounding: the agent takes the wrong action, downstream steps build on that error, you find out six steps later. Or you don't find out until a user hits it in production.

This asymmetry should directly affect how much upfront rigor you apply to the architecture decision. The higher the cost of failure, the more you need to be certain before you build.

The GrowthOS Module Breakdown

Running all four modules through this framework makes the pattern clear:

Module	Task Type	Stateful?	Failure Cost	Decision
Module 1: Knowledge base	Answer questions about docs	No	Low (visible)	RAG
Module 2: Skill profiling	Compute tags from behavior events	No (batch job)	Medium	Rules engine
Module 3: Learning paths	Generate + monitor + replan	Yes	High (silent drift)	Agent
Module 4: Tracking + flywheel	Detect signals, update weights	Partial	Medium	Hybrid

The interesting case is module two. You might expect a skill-tagging system to use RAG or Agent, but the task is actually deterministic: behavior events map to skill weights via defined rules, decay runs on a schedule, nothing requires LLM inference. A rules engine with a cron job is more reliable and cheaper than an LLM call for every event.

Over-reaching for AI where deterministic logic is sufficient is one of the most common and expensive mistakes in production systems. The question isn't "can AI do this?" but "does this task actually require AI?"

The Enforcement Problem

Knowing the framework doesn't help if you don't apply it at the right moment. The right moment is before you write any code — at the point where the architecture is still a decision, not a sunk cost.

In practice, most developers (myself included) reach the architecture question after they've already started building. The pattern looks like:

Start implementing a feature
Realize something isn't working
Debug for hours
Eventually diagnose a fundamental architecture mismatch
Rebuild

What I needed was something that forced the decision earlier — ideally the moment I started describing a new module or feature, before the first tool call.

This is the problem Rein is designed to solve.

How Rein Enforces Upfront Architecture Decisions

Rein is an open-source Skill for Claude Code that monitors your development conversations and intervenes at specific diagnostic moments.

For architecture decisions, Rein's Q1 layer (SPEC) enforces a constraint: before any implementation work begins on a feature involving data retrieval or automated decision-making, the SPEC must answer:

What is the output type? (information vs decision vs state change)
Does the task require state across multiple steps?
What is the failure mode and its cost?
Which primitive does this map to: rules engine / RAG / single Agent / multi-Agent?

If you start describing an implementation without these questions answered, Rein surfaces them. Not as a checklist — as targeted questions based on what you've described.

The second enforcement point is Q4 (verification scripts). Architecture decisions aren't just written down; they're verified. Before module three was considered "done," verify.sh included:

check "PathAgent tool list matches SPEC" \
  "grep -c 'def get_employee_profile\|def analyze_skill_gap\|def search_learning_materials\|def generate_learning_path\|def monitor_progress' agent/path_agent.py | grep -q '^5$'"

check "MonitorAgent runs on schedule" \
  "grep -q 'monitor_agent\|schedule\|cron' backend/main.py"

If the implementation drifts from the SPEC, the gate fails. You find out immediately, not in production.

The Silence Rule

One design principle worth noting: Rein is silent when there's nothing to flag.

This matters because most Harness tooling errs toward verbosity — warning about everything, asking for confirmation constantly, inserting itself into every decision. The overhead degrades the development experience until you start ignoring it.

Rein's trigger conditions are narrow and specific. For architecture decisions:

Trigger: you describe a new feature involving retrieval or automated decisions, without a SPEC that answers the three diagnostic questions
No trigger: you're implementing a feature with a clear SPEC already written
No trigger: you're debugging, refactoring, or working on UI

In the 16-scenario benchmark, Rein triggered on 100% of cases where intervention was warranted and stayed silent on 100% of cases where it wasn't. The silence test is as important as the trigger test.

The Practical Takeaway

If you're building an AI system and haven't explicitly answered these three questions for every component, you're accumulating architecture debt that compounds:

Is the output information, or a decision/state change?
Does the task require state across multiple steps?
What's the cost if this is wrong?

The answers don't have to be permanent — architectures evolve as requirements change. But they need to exist before you build, not after you've rebuilt twice.

RAG and Agent are not interchangeable tools on a gradient. They're different primitives for different problem shapes. Getting the match right early is one of the highest-leverage decisions in AI system design.

Rein is open source: github.com/DtoTHEmoon/rein-skill

Install:

git clone https://github.com/DtoTHEmoon/rein-skill.git ~/.claude/skills/rein

Top comments (2)

Theo Valmis • Jun 3

Enforcing the RAG-vs-agent decision upfront helps. Enforcing it at generation time helps more: an agent can't quietly route a new feature through the wrong layer if the constraint is checked before the diff exists. That's the class of pre-flight constraint we're building Mneme around.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.