DEV Community: Ricardo Feldhaus Krisanoski

Why my first RAG layer starts in Postgres, not in a standalone vector database

Ricardo Feldhaus Krisanoski — Sat, 13 Jun 2026 12:30:37 +0000

When people say they are "adding RAG" to a workflow, the conversation often jumps too quickly to infrastructure choices.

Should this use a vector database?
Should there be a reranker?
Should everything go into a knowledge graph?

Those are valid questions, but they are usually not the first question.

The first question is narrower:

What approved knowledge should the workflow be allowed to retrieve before an AI decision happens?

That is why my first retrieval layer for operational AI workflows starts in Postgres, not in a standalone vector database.

The Workflow Problem

In operations-heavy systems, the model usually should not answer from raw memory or from a giant prompt dump.

The useful context already exists somewhere else:

approved response rules;
handoff criteria;
product or service notes;
source or campaign guidance;
operational decisions that were already made by humans.

The hard part is not generating fluent text.

The hard part is retrieving the right approved context, showing which source influenced the decision and refusing when no safe source exists.

Why Postgres First

For this kind of workflow, most of the surrounding data is already relational:

leads or conversations;
workflow names;
stages and owners;
human review outcomes;
source metadata;
trace logs;
document versions.

So the first technical choice is not "where do vectors live in the abstract?"

It is:

Where can I keep retrieval close to the operational data model?
Where can I log the retrieval path and the final decision together?
Where can I evolve the schema without creating a second system too early?

Postgres plus pgvector is a good first answer to that set of questions.

It lets me keep:

documents and chunks;
metadata such as allowed use and approval requirements;
retrieval traces;
cost estimates;
human review outcomes

in one place.

What The First Version Needs

The first version does not need to be broad.

It needs to be inspectable.

My narrow retrieval scope looks like this:

approved response rules;
product/service notes;
handoff and escalation criteria;
campaign/source guidance;
commercial playbooks.

Each retrieved chunk should carry more than text. It should also carry metadata such as:

source name;
document version;
business domain;
allowed use;
whether human approval is required.

That metadata matters because a workflow may be allowed to use one chunk as internal reasoning support, but not as customer-facing language.

The Eval Mindset

I do not think a retrieval layer is real until it has failure criteria.

So the public prototype includes a small golden-question set:

Does the expected source appear in the top results?
Does the workflow return no-answer or handoff when the source is missing?
Does customer-facing language come only from allowed chunks?
Can I review which chunks influenced the decision later?

That matters more than announcing that the system has embeddings.

Without retrieval checks, a RAG layer can look sophisticated while still pulling the wrong context.

Observability Is Part Of The Design

The retrieval step and the AI decision step should be traceable together.

I want the same review surface to show:

retrieved chunk IDs;
similarity or retrieval score;
model name;
token and cost estimates;
final decision;
handoff reason;
human review outcome.

That is the bridge between "the system answered" and "the system answered for a defensible reason."

When I Would Add More Infrastructure

I am not against standalone vector databases.

I would add one later if the system needed:

more search traffic;
more complex filtering boundaries;
separate deployment requirements;
recall/latency needs that justify the extra moving parts.

But before that point, I prefer a smaller stack that makes retrieval, evaluation and auditability visible.

The Practical Lesson

For AI workflows in revenue or operations contexts, the first retrieval layer should optimize for control, review and schema clarity.

Not for maximum architectural novelty.

That is why my first RAG layer starts in Postgres:

closer to operational data;
easier to trace;
easier to evaluate;
easier to keep human-in-the-loop.

The public case study is here:

https://github.com/rkrisa/portfolio-ai-ops/tree/main/cases/operational-knowledge-retrieval-layer

I added a context resolver before an AI sales agent replies

Ricardo Feldhaus Krisanoski — Thu, 28 May 2026 05:38:41 +0000

Most AI sales agents fail before the model writes a single word.

The failure is not always the prompt. It is usually the context.

In a real chat-commerce workflow, a lead can arrive with several competing signals:

the latest customer message;
CRM stage and owner;
previous conversation history;
campaign or source data;
product/category assumptions;
approved response rules;
handoff and support policies.

If all of that gets dumped into a prompt, the model may produce a fluent answer based on the wrong clue.

That is not an AI problem in the abstract. It is an operating-system problem.

The Problem

I was designing an AI reception agent for a chat-driven commerce operation.

The goal was simple: help the business reply faster and more consistently without losing the commercial context behind each lead.

But the first automated reply had a hidden risk.

A customer might send a short message like "hi" or "I want more information." On its own, that message is weak. The stronger signal may be the campaign, source, CRM stage, product page, previous conversation or approved sales rule.

If the AI agent receives all possible context at once, it still has to decide what matters.

That decision should not be left entirely to generation.

The Design Choice

I added a context-resolution step before the AI response.

Instead of asking the model to inspect every clue and improvise, the workflow first resolves a smaller object:

{
  "source_priority": "campaign_or_crm_or_message_or_fallback",
  "category": "resolved_commercial_category",
  "confidence": "high_or_medium_or_low",
  "selected_directive": "one_approved_response_rule"
}

The important part is not the exact schema. It is the order of decisions.

The system first decides which commercial context is most trustworthy. Only then does the AI agent write the reply.

The Architecture

The workflow looks like this:

Incoming chat lead.
CRM and conversation lookup.
Campaign or source-context lookup.
Context resolver.
Resolved commercial context.
AI response agent.
Structured response and routing decision.
Customer reply or human handoff.

The resolver is intentionally boring.

It is a control layer, not a creativity layer.

It exists to answer questions like:

What is the lead probably asking about?
Which source should win when signals conflict?
Is confidence high enough to answer directly?
Which approved sales rule should be used?
Should the system reply, ask a clarifying question or hand off to a human?

Why This Matters

In business workflows, "more context" is not always better.

More context can mean more ambiguity:

old messages compete with new ones;
generic playbooks compete with campaign-specific offers;
product assumptions compete with what the customer actually asked;
internal rules compete with customer-facing language.

The context resolver reduces that ambiguity before the model responds.

The AI layer becomes easier to debug because every reply can be traced back to a chosen context, confidence level and directive.

Guardrails

The workflow keeps several guardrails around the AI response:

low confidence triggers a consultative fallback;
sensitive commercial cases can be routed to human review;
approved response rules are selected before generation;
the agent receives a compact context package instead of a noisy dump;
the system logs what context was used.

This is not about making the AI sound more impressive.

It is about making the operational decision safer.

What I Would Measure Next

The current public version keeps metrics as metrics to collect, because I do not want to publish numbers that are not validated.

The useful metrics would be:

lead volume handled by the context resolver;
reduction in wrong-context replies;
human handoff rate by risk category;
response-time impact;
manual review time saved.

The Lesson

For AI agents in revenue workflows, the prompt is only one part of the system.

The harder design question is:

What should the model be allowed to know, trust and act on?

That is why I prefer designing AI agents as operating workflows: context resolution, retrieval, guardrails, structured outputs, human review and observability.

The public case study is here:

https://github.com/rkrisa/portfolio-ai-ops/tree/main/cases/context-aware-ai-reception-agent