You Shouldn't Need a Vector Database Just to Make Your Object Follow a Policy Document

#python #ai #rag #pydantic

Here's the RAG tax in full:

Vector database. Embedding pipeline. Chunking logic. Retrieval tuning. Token budget management. Context injection into your prompts — carefully, so you don't blow the window.

Most tutorials cover 200 lines before reaching the part that actually matters: the output.

All of this infrastructure exists to answer one question: does this input match something in my document?

That's not a database problem. That's an object that doesn't know its own rules.

The real villain: your object is ignorant of your domain

LLMs know a lot about the world. They don't know your pricing tiers, your exclusion policies, or the compliance rules your team wrote last quarter.

Standard practice says: build a RAG pipeline to retrieve the right chunks from your docs and inject them into the prompt before calling the model.

It works. It also turns a 10-line feature into a 200-line infrastructure project — for every model that needs grounded behavior.

The problem isn't retrieval. The problem is that your domain object has no way to say: "when I'm created, consult this document."

exomodel makes that a one-method override.

Attaching a document to an object

You're building a proposal generator. Your business rules live in a markdown file:

# Proposal Rules

- Minimum project budget is $10,000.
- Every proposal must include a 10% safety margin in pricing.
- We do not work with companies in the tobacco or gambling industries.
- Timeline estimates must account for a 2-week QA buffer.

This is the document your object needs to know. Here's how it learns it:

from exomodel import ExoModel

class Proposal(ExoModel):
    client: str = ""
    project_title: str = ""
    budget: float = 0.0
    timeline_weeks: int = 0
    summary: str = ""

    @classmethod
    def get_rag_sources(cls):
        return ["proposal_rules.md"]

That's the entire integration. No pipeline. No database. No chunking code.

p = Proposal.create("Draft a proposal for Acme Corp — cloud migration project, 8 weeks")

print(p.budget)          # 45000.0  (includes 10% safety margin)
print(p.timeline_weeks)  # 10       (8 weeks + 2-week QA buffer)
print(p.summary)         # A cloud migration engagement for Acme Corp...

The object applied your rules. You wrote zero lines of prompt engineering.

What's actually happening

When get_rag_sources() is defined, exomodel runs the full RAG stack for you — internally, at call time, without external dependencies:

Reads each file listed
Chunks the content into overlapping segments
Embeds the chunks into an in-memory vector store
Retrieves the sections most relevant to the input
Injects them into the prompt alongside your schema

No external database. No embedding API to configure. No persistent state to manage. The index lives in memory for the duration of the request.

The infrastructure didn't disappear — it moved inside the object where it belongs.

The object now knows what it's not allowed to do

RAG-grounded behavior isn't just about filling fields correctly. It enforces constraints:

p = Proposal.create("Draft a 5k proposal for BetMax Casino")

print(p.run_analysis())
# → This proposal violates company policy:
#   budget is below the $10,000 minimum, and the client
#   operates in the gambling industry.

The rule came from proposal_rules.md — not from the model's training data, not from a prompt you wrote, not from a validator you coded by hand.

This is what grounded behavior looks like: traceable to a source, enforced at the object level.

Multiple sources

@classmethod
def get_rag_sources(cls):
    return [
        "proposal_rules.md",
        "pricing_tiers.md",
        "client_blacklist.md",
    ]

exomodel indexes all of them together and retrieves across the combined corpus. Add a document — the object learns it. Remove it — the object forgets it. No pipeline to rebuild.

When does this beat a dedicated vector database?

Always, when your use case is document-grounded object behavior — objects that need to act according to rules defined in files.

A dedicated vector database beats this when you have thousands of documents, need persistent indices, or need cross-session retrieval at scale. For that, use the right tool.

But if you've ever stood up a vector pipeline just to make one object respect a policy document, that was the integration tax. You don't owe it anymore.