DEV Community

Cover image for Building a Safety-First RAG Triage Agent in Python
Tahrim Bilal
Tahrim Bilal

Posted on

Building a Safety-First RAG Triage Agent in Python

On May 1st, I participated in HackerRank Orchestrate 2026 — a 24-hour hackathon where the challenge was deceptively simple: build a terminal-based support triage agent that handles tickets across HackerRank, Claude, and Visa using only a provided support corpus.

🏆 This agent ranked #154 out of 12,885 participants —
top 200 globally, top 1.2% — in my first ever hackathon.
Solo. No team. No paid tools.

The catch? No hallucinations. No unsafe replies. Zero tolerance for wrong answers on fraud or billing tickets.

Here's how I built a hybrid RAG agent that prioritizes safety over speed — and why I burned through 3 API keys in the process.

The Problem: Why Vanilla RAG Wasn't Enough

Most RAG tutorials show how to chunk documents, embed them, and ask questions. That's fine for a blog demo. But for a production support system handling fraud reports, billing disputes, and account compromises, vanilla RAG is dangerous.

What happens when:

  • A user says "My identity was stolen, what should I do?"
  • The retriever finds a doc about "Identity verification for new accounts"
  • The LLM generates a helpful response about uploading ID documents

That's a catastrophic failure. Someone in distress gets a bureaucratic runaround instead of immediate escalation to a human agent.

I needed a system that escalates first, generates second.

The Architecture: 5-Stage Safety-First Pipeline

Stage 1: Classification

One LLM call extracts structured metadata:

# classifier.py
SYSTEM_PROMPT = """You are a support ticket classifier...
Return ONLY a JSON object:
{
  "company": "<HackerRank | Claude | Visa | Unknown>",
  "request_type": "<product_issue | feature_request | bug | invalid>",
  "product_area": "<short phrase>",
  "risk_level": "<low | high>"
}"""

def classify(llm, ticket_company, issue_text):
    result = llm.chat_json(SYSTEM_PROMPT, user_msg)
    # Sanitize and fallback
    return {
        "company": result.get("company", "Unknown"),
        "request_type": result.get("request_type", "product_issue"),
        "product_area": result.get("product_area", "general").lower(),
        "risk_level": result.get("risk_level", "low").lower()
    }
Enter fullscreen mode Exit fullscreen mode

Key insight: I keep risk_level for logging but do NOT use it for escalation. The LLM over-flags benign tickets like "how do I delete my account" as high risk. Deterministic rules are more precise.

Stage 2: Safety Gate (Zero LLM Calls)

This is the heart of the system. Before any retrieval or generation, deterministic rules check for danger:

# safety.py
def check(classification, issue_text):
    text_lower = issue_text.lower()

    # 1. Bug reports → always escalate to engineers
    if classification.get("request_type") == "bug":
        return True, "Bug report escalated to technical team"

    # 2. Sensitive product areas
    product_area = classification.get("product_area", "").lower()
    for sensitive in HIGH_RISK_PRODUCT_AREAS:
        if sensitive in product_area:
            return True, f"Product area '{product_area}' is sensitive"

    # 3. Keyword scan
    for kw in ESCALATION_KEYWORDS:
        if kw in text_lower:
            return True, f"Contains sensitive keyword '{kw}'"

    # 4. Assessment integrity (HackerRank-specific)
    integrity_phrases = [
        "increase my score", "change my score", "graded me unfairly",
        "review my answers", "move me to the next round"
    ]
    for phrase in integrity_phrases:
        if phrase in text_lower:
            return True, f"Assessment integrity dispute: '{phrase}'"

    return False, ""
Enter fullscreen mode Exit fullscreen mode

My Keyword List:

ESCALATION_KEYWORDS = [
    # fraud / financial
    "fraud", "unauthorized charge", "chargeback",
    "scam", "identity theft", "money back",
    "refund request", "billing dispute", "payment dispute",
    # account security
    "account hacked", "account compromised", "someone else logged in",
    "account suspended", "account banned", "account terminated",
    # legal
    "lawsuit", "legal action", "attorney", "lawyer", "court",
    # assessment integrity
    "cheating", "plagiarism", "academic integrity", "proctoring dispute",
    "candidate cheated", "unfair disqualification",
    # other high-risk
    "security breach", "vulnerability", "security vulnerability",
    "ban the seller", "ban this seller", "make visa refund",
    "subscription",
]
Enter fullscreen mode Exit fullscreen mode

Stage 3: Retrieval (Free, Local, Fast)

# retriever.py
class Retriever:
    def __init__(self):
        self.index = faiss.read_index(str(INDEX_FILE))
        self.model = SentenceTransformer(EMBEDDING_MODEL)
        with open(CHUNKS_FILE) as f:
            self.chunks = json.load(f)

    def retrieve(self, query, company=None, top_k=6):
        # Embed query (L2-normalized → dot product == cosine)
        q_vec = self.model.encode([query], normalize_embeddings=True)

        # Company filtering
        if company:
            filtered = [(i, c) for i, c in enumerate(self.chunks) 
                       if c["company"].lower() == company.lower()]
            if len(filtered) < top_k:
                filtered = list(enumerate(self.chunks))  # fallback
        else:
            filtered = list(enumerate(self.chunks))

        # Build temporary index on subset
        idxs = np.array([i for i, _ in filtered])
        subset = np.stack([self.index.reconstruct(int(i)) for i in idxs])

        sub_index = faiss.IndexFlatIP(subset.shape[1])
        sub_index.add(subset)

        scores, positions = sub_index.search(q_vec, min(top_k, len(filtered)))

        return [{"text": self.chunks[int(idxs[p])]["text"], 
                 "score": float(scores[0][i]), ...} 
                for i, p in enumerate(positions[0]) if p >= 0]
Enter fullscreen mode Exit fullscreen mode

Why FAISS: No server, no API cost, loads in <2 seconds. 1773 vectors is tiny — no need for approximate search.

Stage 4: Fast vs Careful Lane

# responder.py
def respond(llm, retriever, classification, issue_text):
    chunks = retriever.retrieve(issue_text, company=classification.get("company"))
    best = retriever.best_score(chunks)

    # FAST LANE: High confidence → single LLM call
    if best >= FAST_LANE_THRESHOLD:  # 0.50
        response = generate(llm, issue_text, chunks)
        return {"status": "replied", "lane": "fast", ...}

    # CAREFUL LANE: Low confidence → verify everything
    if best >= SIMILARITY_THRESHOLD:  # 0.35
        relevant = check_relevance(llm, issue_text, chunks)
        if not relevant:
            # Query rewrite + retry
            rewritten = rewrite_query(llm, issue_text)
            new_chunks = retriever.retrieve(rewritten, ...)
            new_best = retriever.best_score(new_chunks)
            if new_best >= SIMILARITY_THRESHOLD:
                chunks = new_chunks
                best = new_best
            else:
                return {"status": "escalated", "lane": "escalated_no_corpus", ...}

    response = generate(llm, issue_text, chunks)

    # SELF-CHECK: Verify every claim is grounded
    grounded, issue = self_check(llm, issue_text, response, chunks)
    if not grounded:
        return {"status": "escalated", "lane": "escalated_ungrounded", ...}

    return {"status": "replied", "lane": "careful", ...}
Enter fullscreen mode Exit fullscreen mode

Self-Check Prompt:

_SELFCHECK_SYSTEM = """You are a fact-checking assistant. Review whether a support response is fully grounded in the provided context.
Return ONLY a JSON object: {"grounded": true/false, "issue": "<describe any unsupported claim, or empty string if grounded>"}
A response is grounded if every factual claim it makes can be directly traced to the context."""
Enter fullscreen mode Exit fullscreen mode

Stage 5: Output Schema

# main.py
OUTPUT_COLS = ["status", "product_area", "response", "justification", "request_type"]

def _build_row(*, status, product_area, response, justification, request_type):
    return {
        "status": status.lower(),
        "product_area": product_area.lower(),
        "response": response.strip(),
        "justification": justification.strip(),
        "request_type": request_type.lower(),
    }
Enter fullscreen mode Exit fullscreen mode

Challenges faced

The biggest challenge was hitting Groq's rate limits. Groq's free tier allows ~20–30 RPM, and with 2–4 LLM calls per ticket across 29 tickets, I burned through my daily quota faster than expected. I cycled through 3 separate API keys across different accounts before running out entirely.

The root cause was my initial sleep delay of 0.5 seconds between tickets — way too aggressive for a free tier. Each ticket made up to 4 LLM calls:
classify, relevance check, query rewrite, and generate.
That's potentially 120 calls per minute, 6x over the limit.

The fix had two parts. First, I increased the sleep delay to 3 seconds between tickets, bringing the effective call rate well within limits. Second, I added Gemini 2.5 Flash as an automatic fallback — when Groq fails 3 consecutive times, the circuit breaker marks it as unavailable and all subsequent calls route to Gemini transparently. No manual intervention needed, no crashed runs.

The lesson: always calculate your actual API call rate before running a batch job.
tickets × calls_per_ticket ÷ sleep_delay = calls_per_minute.
Do that math before you start, not after you hit your third 429.

Results

The agent processed all 29 tickets across HackerRank, Claude, and Visa.
Against the sample set with expected outputs:

  • 9/10 accuracy on status classification (replied vs escalated)
  • 0 dangerous false negatives — not a single fraud, billing, or security ticket was incorrectly replied to
  • Top 200 globally out of 12,885 registered participants
  • Top 154 out of 1,349 who defended their agent to an AI judge

The one miss: a test expiration ticket where the corpus similarity
score was borderline (0.38). I chose not to lower the threshold to
fix that one case — doing so would risk incorrect replies on tickets
where corpus evidence is genuinely weak. Safe escalation beats
risky reply every time.

What I'd Do Differently

1. Fine-tuned lightweight classifier
Replace the LLM classifier call with a fine-tuned BERT-based model (e.g. distilbert-base-uncased fine-tuned on support ticket data). This eliminates one LLM call per ticket, reduces latency by ~2 seconds, and removes rate limit pressure on the most frequent operation in the pipeline.

2. BM25 + vector hybrid retrieval
Pure dense retrieval struggles with exact product terminology. A ticket saying "LTI integration key" doesn't embed close to "LTI SSO configuration" even though they're the same topic. Combining BM25 keyword matching with dense vector search via Reciprocal Rank Fusion (RRF) would recover these vocabulary-mismatch cases without needing a fine-tuned embedding model.

3. Query expansion with synonyms
The corpus uses product-specific language that customers don't. Adding a synonym layer — "blocked card" → "frozen card", "cancelled subscription" → "paused plan" — before embedding would improve retrieval on edge cases without any model changes.

4. Pre-built per-company FAISS indexes
My current retriever builds a temporary sub-index per query to filter by company. This is O(N) on every call. The cleaner fix is to build three separate indexes at index-build time — one per company — and load them at startup. Zero query-time filtering overhead, pure FAISS search from the first call.

5. Local LLM via Ollama
Running a quantized Llama model locally via Ollama would eliminate API rate limits entirely. I considered it but my hardware couldn't run a 70B model at acceptable speed. With a GPU or a smaller quantized model (e.g. Llama 3.2 3B), this would be the cleanest zero-dependency setup.

The One Thing That Mattered Most

The most important design decision in this entire project wasn't
the retrieval strategy, the dual-LLM setup, or the query rewriting.

It was this: safety decisions should be deterministic Python,
not probabilistic LLM.

An LLM asked "is this ticket sensitive?" will try to be helpful.
It might say no on a fraud case if the phrasing is polite. A Python
rule checking for "unauthorized charge" says yes, every single time,
with zero variation.

In a support context that consistency isn't a nice-to-have.
It's the difference between a system you can audit and one you
can only hope works correctly.

If you're building RAG where wrong answers have real consequences —
start with the safety gate, not the retrieval algorithm.

Code

Full implementation on GitHub:
https://github.com/Tahrim19/hackerrank-orchestrate-may26

All agent code is in the code/ directory. Build the index
locally with python code/build_index.py before running.

Top comments (0)