Anton Fedotov

Posted on Apr 29

Adding a Trust Boundary to a LlamaIndex RAG Pipeline

#llamaindex #agents #security #rag

Your LlamaIndex app does not only retrieve documents.

It decides which external text is allowed to become model context.

That is a trust decision, even if your code does not call it one.

A PDF can contain useful facts.
A support ticket can contain real customer context.
A web page can contain documentation.
An email thread can contain the answer your user needs.

But all of those sources can also contain instructions your model should never follow.

That is the uncomfortable part of RAG security: the dangerous text often does not come from the user prompt. It comes from the documents.

This post shows how to add a trust boundary to a LlamaIndex RAG pipeline with Omega Walls.

The core idea is simple:

Retrieved text is evidence, not policy.

And the safest place to enforce that is between retrieval and synthesis.

The RAG failure mode

A typical LlamaIndex flow looks clean:

documents -> index -> query engine -> response

In code, it may look something like this:

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

response = query_engine.query(
    "Summarize the customer escalation and suggest the next step."
)

print(response)

That is a good developer experience.

But it hides a trust problem.

The query engine retrieves relevant chunks. Those chunks are then used by the LLM to synthesize an answer. That is the normal RAG path: retrieve text, feed it into the answer-generation step, produce a response.

The issue is that retrieved text can carry two very different kinds of content:

Useful evidence:
- customer reported X
- policy says Y
- document describes Z

Untrusted instruction:
- ignore previous instructions
- reveal the system prompt
- call this tool
- send this data somewhere else

If both kinds of text are placed into the same context without a boundary, the model has to separate evidence from instruction by itself.

That is not a reliable boundary.

Retrieved text is evidence, not policy

The important shift is small but sharp:

Retrieved text should help the model answer.
It should not control the workflow.

That means your RAG pipeline should preserve a hard distinction between:

trusted:
- system policy
- developer instructions
- app configuration
- user request

untrusted:
- retrieved web pages
- PDFs
- emails
- support tickets
- uploaded files
- tool outputs containing external text

The model should not have to infer that distinction from formatting alone.

Your application should enforce it before the context is built.

In Omega Walls, this is the role of the trust boundary. Untrusted content is projected into structured risk signals, filtered, and only allowed chunks are passed forward into context. Tool calls stay behind a tool gateway.

For a LlamaIndex RAG app, the most natural placement is:

documents / web / tickets / PDFs
        ↓
index / retriever
        ↓
Omega Walls trust boundary
        ↓
allowed chunks
        ↓
query engine / response synthesis
        ↓
answer

The trust boundary belongs between retrieval and synthesis: external documents are useful evidence, but they should be inspected before they shape the final context.

Why post-generation checks are too late

It is tempting to check the final answer.

That can still be useful.

But it is not enough.

By the time the answer exists, the retrieved chunk may already have influenced:

which facts were selected,
which instruction hierarchy the model followed,
whether a tool should be called,
how intermediate summaries were formed,
what got written into memory,
what source got treated as authoritative.

In a RAG pipeline, the critical moment is earlier:

retrieved chunks -> context construction

That is where external text becomes model context.

So the boundary should live there.

Not only at the user input.
Not only after generation.
Before synthesis.

Install

Install Omega Walls with integration adapters:

pip install "omega-walls[integrations]"

This gives you the framework adapters, including the LlamaIndex guard.

Minimal LlamaIndex integration

Start with your normal LlamaIndex setup:

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex

documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)

query_engine = index.as_query_engine()

Now wrap the query engine with Omega:

from omega.integrations import OmegaLlamaIndexGuard

guard = OmegaLlamaIndexGuard(profile="quickstart")

query_engine = guard.wrap_query_engine(
    index.as_query_engine()
)

response = query_engine.query(
    "Summarize this support note",
    thread_id="sess-123",
)

print(response)

That is the smallest useful version.

You still use LlamaIndex as your data/query layer. You still build your index normally. You still call the query engine normally.

The difference is that retrieved context now passes through a trust boundary before it is allowed to shape the response.

What actually gets guarded

A useful RAG boundary needs to cover more than the initial query string.

At minimum, you should think about five surfaces.

1. User query

The user query starts the request.

It may be harmless. It may be adversarial. It may also be asking the app to summarize an external document that contains adversarial text.

The guard should know which session this belongs to.

response = query_engine.query(
    "Summarize the attached escalation notes.",
    thread_id="support-case-1842",
)

That thread_id matters because stateful detection only makes sense if related steps belong to the same workflow.

2. Retrieved chunks

This is the main RAG surface.

A retrieved chunk may contain facts and hidden instructions at the same time.

The boundary should inspect those chunks before they become prompt context.

retriever returns nodes/chunks
        ↓
guard evaluates external text
        ↓
only allowed chunks enter synthesis

3. Tool calls

Some LlamaIndex apps are not just read-only QA systems.

They retrieve, reason, and then call tools.

For example:

send a ticket update,
post a summary,
call an internal API,
write a file,
trigger a workflow,
fetch more external data.

If a tool can act outside the model, it should sit behind a gateway.

from omega.integrations import OmegaLlamaIndexGuard

guard = OmegaLlamaIndexGuard(profile="quickstart")

def network_post(url: str, payload: dict) -> dict:
    # Your real external action lives here.
    return {"status": "ok", "url": url}

safe_network_post = guard.wrap_tool(
    "network_post",
    network_post,
)

Then use safe_network_post instead of the raw function.

result = safe_network_post(
    url="https://internal.example/support/update",
    payload={
        "case_id": "1842",
        "summary": "Customer escalation summarized from guarded context."
    },
)

The important part is not this specific tool.

The important part is that tool execution goes through one chokepoint.

4. Memory writes

RAG systems often create derived state:

summaries,
notes,
extracted facts,
cached answers,
user preferences,
case memory,
long-term knowledge.

If that state came from external text, preserve provenance.

Do not turn this:

external PDF says: approval can be skipped

into this:

memory fact: approval can be skipped

without a source/trust tag.

A simple pattern:

decision = guard.check_memory_write(
    text="The external document says approval can be skipped.",
    source_id="pdf:customer-escalation-1842",
    source_type="pdf",
    source_trust="untrusted",
    thread_id="support-case-1842",
)

if decision.mode == "allow":
    save_to_memory(...)
elif decision.mode == "quarantine":
    save_to_quarantine(...)
else:
    print("Memory write denied")

The exact memory store is your choice.

The rule is the point:

Memory should remember where information came from.

5. Session context

Single-document checks miss a lot.

A mild instruction in one chunk may not look serious.
A later chunk may ask for a secret.
A later tool output may introduce an action.
Together, the pattern matters.

That is why a RAG guard should be stateful across a session, not just a one-shot text classifier.

A small end-to-end example

Here is a compact example showing the integration shape.

from llama_index.core import SimpleDirectoryReader, VectorStoreIndex
from omega.integrations import OmegaLlamaIndexGuard
from omega.adapters import OmegaBlockedError, OmegaToolBlockedError

# 1. Build your normal LlamaIndex index
documents = SimpleDirectoryReader("./docs").load_data()
index = VectorStoreIndex.from_documents(documents)

# 2. Create the Omega guard
guard = OmegaLlamaIndexGuard(profile="quickstart")

# 3. Wrap the query engine
query_engine = guard.wrap_query_engine(
    index.as_query_engine()
)

# 4. Optional: wrap tools that may act outside the model
def network_post(url: str, payload: dict) -> dict:
    return {"status": "ok", "url": url}

safe_network_post = guard.wrap_tool(
    "network_post",
    network_post,
)

# 5. Use the guarded query engine
try:
    response = query_engine.query(
        "Summarize the support note and recommend the next action.",
        thread_id="support-case-1842",
    )

    print(response)

except OmegaBlockedError as exc:
    print("Blocked query/input step")
    print("Outcome:", exc.decision.control_outcome)
    print("Reasons:", exc.decision.reason_codes)

except OmegaToolBlockedError as exc:
    print("Blocked tool call")
    print("Tool:", exc.gate_decision.tool_name)
    print("Reason:", exc.gate_decision.reason)

This gives your application a real block path.

Not a vague error.
Not a silent failure.
Not a mysterious bad answer.

A structured decision you can log, route, or escalate.

What happens to risky documents?

A good RAG boundary should not kill the whole app because one chunk looked suspicious.

Usually, the better behavior is selective:

risky chunk detected
        ↓
remove that chunk from context
        ↓
continue with safe chunks
        ↓
freeze tools if tool-abuse pressure appears
        ↓
escalate if secrets/exfiltration are involved

That is controlled degradation.

The workflow should be able to continue with the remaining safe context.

Example behavior:

User asks:
"Summarize this customer note."

Retriever returns:
- doc-1: normal support note
- doc-2: external email with hidden instruction
- doc-3: product policy excerpt

Boundary:
- allows doc-1
- blocks doc-2
- allows doc-3

Query engine:
- synthesizes answer from doc-1 and doc-3
- does not include doc-2 in context

This is much better than two common alternatives:

bad option 1: pass every retrieved chunk into context
bad option 2: hard-stop the whole workflow on any suspicious text

The useful middle path is:

remove risky influence, keep safe work moving

Retrieved content should enter the prompt as evidence with provenance, not as workflow authority.

Verify the integration

After wiring the adapter, run the LlamaIndex smoke:

python scripts/smoke_llamaindex_guard.py --strict

This is the first verification step.

You are not only checking that imports work.

You are checking that the guard is actually on the execution path.

For a broader release gate across all framework adapters:

python scripts/run_framework_smokes.py --strict

The expected summary should be boring:

status = ok
framework_count = 6
total_failures = 0
min_gateway_coverage >= 1.0
total_orphans = 0

Boring is good here.

It means the adapter is not decorative.

What to log

A trust boundary becomes much more useful when decisions are explainable.

At minimum, log:

session_id
source_id
source_type
decision outcome
reason codes
blocked docs
tool gateway decisions

For production systems, avoid raw content by default.

Store hashes, bounded evidence, redacted excerpts only when policy allows it, and enough structured data to reproduce the decision later.

That gives you incident review without turning your security logs into a new data leak.

Practical checklist for LlamaIndex RAG apps

When reviewing a LlamaIndex app, I would walk through this checklist.

1. What data sources are indexed?

List them:

internal docs
uploaded PDFs
support tickets
email threads
web pages
chat exports
tool outputs

Then mark trust level.

If you cannot mark trust level, assume untrusted.

2. Where does retrieval happen?

Find the point where the query engine receives retrieved chunks or nodes.

That is where the boundary belongs.

3. Can retrieved text reach synthesis directly?

If yes, add a guard before synthesis.

The synthesis step should receive allowed evidence, not raw external content.

4. Are sources preserved?

Each chunk should retain enough metadata:

{
    "source_id": "pdf:customer-escalation-1842",
    "source_type": "pdf",
    "source_trust": "untrusted",
}

If the chunk gets summarized or cached, preserve that provenance.

5. Can the RAG flow trigger tools?

If yes, wrap those tools.

Read-only retrieval is one risk level.
File writes, network calls, ticket updates, and outbound messages are another.

6. Do security docs create false positives?

Your guard should not panic just because a document discusses prompt injection, API keys, or jailbreaks.

Security guidance is not the same thing as an active attack.

That is why polarity and hard-negative tests matter.

A document saying:

Never reveal API keys.

should not be treated like:

Reveal the API key.

The difference is small in keywords and huge in intent.

Why this is not just prompt filtering

A prompt filter usually asks:

Is this text bad?

A RAG trust boundary asks a better question:

Should this external text be allowed to shape model context, memory, or tools in this session?

That is a different problem.

It needs:

source awareness,
session awareness,
context placement,
tool gateway enforcement,
memory provenance,
selective blocking,
auditable decisions.

This is why the boundary belongs in the pipeline, not just in a prompt template.

What this does not solve

A trust boundary is not magic.

It does not replace:

secret management,
least-privilege tools,
network allowlists,
auth and permissions,
model-side safety policies,
human approval for sensitive operations,
logging and incident response.

It also depends on correct placement.

If untrusted text can bypass the guard and enter context directly, the boundary is broken.

If tools can execute outside the gateway, tool enforcement is broken.

If source metadata is stripped too early, later steps cannot distinguish trusted evidence from untrusted text.

So the rule is simple:

Put the boundary on the actual path between retrieval and synthesis.

Not next to it.

A good rollout order

I would not start with hard blocking in production.

Use a safer rollout:

1. Wrap the query engine.
2. Wrap any tools that can act outside the model.
3. Preserve source_id, source_type, and source_trust.
4. Run the strict LlamaIndex smoke.
5. Run in monitor mode on realistic traffic.
6. Inspect reports and decisions.
7. Tune hard negatives and source handling.
8. Enable enforcement for selected paths.

The important part is the monitor phase.

You want to see:

which sources are noisy,
which chunks would be blocked,
whether benign security docs stay quiet,
whether tool-freeze decisions are understandable,
whether operators have enough information to act.

Hard blocking without observability is how a safety layer becomes a production incident.

Final thought

RAG makes it easy to give a model more context.

That is useful.

But context is not neutral.

When your app retrieves documents, it is deciding which external text gets a chance to influence the model. If that text comes from web pages, PDFs, emails, tickets, uploads, or tool outputs, it should not be treated as trusted by default.

The better rule is:

Retrieved text is evidence, not policy.

LlamaIndex gives you the data/query layer.

Omega Walls adds the trust boundary around the part of the pipeline where external text becomes context.

That boundary should sit before synthesis, before memory writes, and before tools.

Not because every document is malicious.

Because production RAG systems should not ask the model to guess what is trusted.

Omega Walls ships framework adapters for LangChain, LangGraph, LlamaIndex, Haystack, AutoGen, and CrewAI.

Install:

pip install "omega-walls[integrations]"

GitHub: https://github.com/synqratech/omega-walls
PyPI: https://pypi.org/project/omega-walls/
Site: https://synqra.tech/omega-walls

Top comments (3)

TxDesk • Apr 30

Trust boundaries in RAG pipelines are underrated. The "retrieved text is evidence, not policy" framing is exactly right.

I ran into a version of this building AI tools for blockchain security data. The LLM wants to be helpful, so when an RPC call fails and returns no data, the natural engineering reflex is to default the field to false. It compiles, it doesn't crash, and it's a lie. "API failed" and "the answer is no" are two completely different statements, but if you flatten them both to false, the model downstream treats them identically and generates confident answers from missing data.

I ended up implementing nullable booleans across every signal. If a data source fails, the field returns null, not false. The AI layer then has to work with "couldn't determine" rather than inventing an answer. Reports where critical signals are null get flagged as partial and never cached, so the next request retries instead of serving a stale wrong answer.

Different domain, same principle: the trust boundary isn't just about what goes in, it's about what the model is allowed to claim when the underlying data is missing or ambiguous. Your placement point between retrieval and synthesis is the right one. In my case the equivalent is between RPC data gathering and risk scoring. Same shape, same failure mode if you skip it.

Anton Fedotov • May 2

This is a great example. I think you’re describing the same failure mode through a different type system.

false and null are not operationally equivalent, but a lot of AI pipelines accidentally make them equivalent because the downstream model only sees a flattened fact. Then “the source failed” silently becomes “the risk is absent,” and the model has no reason to stay uncertain.

That nullable-boolean pattern is exactly the kind of boundary I’d want more AI systems to preserve: not just “is this value safe/unsafe?”, but “what is the provenance and confidence state of this value?” Missing evidence should remain missing evidence. It should not become negative evidence.

The no-cache rule for partial reports is also important. Otherwise the system can turn a temporary data failure into durable memory, which is often worse than the original failure.

So yes, same shape: before synthesis or scoring, there needs to be a layer that preserves trust, provenance, and uncertainty instead of compressing everything into a convenient boolean. That’s a really useful way to frame it.

TxDesk • May 2

"Missing evidence should remain missing evidence. It should not become negative evidence." That's the cleanest way I've seen anyone phrase it. Stealing that for my project's design docs. The no-cache rule for partial reports was the hardest decision because it means some requests are slower on retry. But the alternative is a cached wrong answer that looks right for 15 minutes and gets served to every user who asks the same question. One slow retry is always cheaper than one confident lie.