We open-sourced Omega Walls: a stateful runtime defense for RAG and AI agents

#agents #opensource #rag #security

Most prompt-injection defenses still think in single turns.

But many real agent failures do not happen in one prompt. They build across retrieved documents, memory carry-over, tool outputs, and later execution.

That is the problem we built Omega Walls for.

Today we’re open-sourcing Omega Walls, a Python runtime defense layer for RAG and tool-using agents.

What Omega Walls does

Omega Walls sits at two important runtime points:

Before final context assembly
Retrieved chunks, emails, tickets, attachments, and tool outputs can be inspected before they are allowed into model context.
At tool execution
Tool calls can be constrained or blocked when accumulated risk crosses the boundary.

Instead of treating each chunk as an isolated moderation problem, Omega Walls turns untrusted content into session-level risk state and emits deterministic runtime actions such as:

block
freeze
quarantine
attribution / reason flags

What it is built for

Omega Walls is designed for:

indirect prompt injection
distributed attacks across multiple chunks or turns
cocktail attacks that combine takeover, exfiltration, tool abuse, and evasion
multi-step flows where no single step looks obviously malicious in isolation

Why we open-sourced it

We think agent security needs more work on runtime trust boundaries, not only better prompt scanning.

If you are building:

RAG pipelines
internal copilots
support or inbox agents
tool-using workflows
agent infrastructure

we’d love your feedback.

GitHub: https://github.com/synqratech/omega-walls
Website: https://synqra.tech/omega-walls

PyPI: https://pypi.org/project/omega-walls/

If you try it, tell us where it breaks, what attack patterns you think matter most, and where this layer should sit in a real stack.

Top comments (2)

Archit Mittal • Apr 14

Stateful runtime defense is the right abstraction for RAG security. Most prompt injection defenses are stateless - they check each input in isolation - but real attacks chain multiple innocent-looking queries to gradually extract sensitive context. The fact that this tracks conversation state means it can catch the 'slow drip' exfiltration patterns that static filters miss entirely. How does it handle the latency overhead on the hot path? In production RAG pipelines I've built, even 50ms extra per query adds up fast when you're doing retrieval + reranking + generation.

Anton Fedotov • Apr 24

Great question. We try to keep Omega off the critical LLM hot path as much as possible: it runs on retrieved chunks / tool-visible context before they enter the model, not as an extra generation step.

In recent tests, short inputs were around ~1–2s and larger document packets in the low-seconds range, but we’re still profiling this carefully. The main design goal is that latency scales with the RAG packet being inspected, while the stateful part is lightweight session state update.

So I agree with the concern: for production, the right benchmark is not “prompt scanning latency”, but overhead per RAG / agent iteration.