DEV Community

Rich Robertson
Rich Robertson

Posted on • Originally published at myrobertson.com

How I Built a Retrieval-Backed Chatbot to Replace My Resume Screening Step

This is a condensed cross-post. Read the full version on my site.


Recruiter conversations have a consistent problem: a resume gives breadth, but not the system details behind the impact statements. I built AskRich to close that gap — a chatbot that lets hiring teams ask specific technical questions and get citation-backed answers grounded in my actual portfolio and writing.

What it does

AskRich is optimized for realistic recruiter workflows:

  • One-click prompt chips for common questions (architecture trade-offs, delivery scope, measurable outcomes)
  • A freeform chat input for custom questions
  • Citations attached to every answer — so a recruiter can verify the source instead of taking generated output at face value
  • A lightweight, dependency-free web UI in plain JavaScript

The citation-backed output is the core product decision. Once someone can see where an answer comes from, they tend to ask sharper follow-ups — and the conversation gets more useful faster.

Architecture overview

At a high level: a thin web client over a retrieval-backed chat API.

The Worker supports three runtime modes: upstream (proxy to retrieval API), local (built-in corpus), and openai (direct model path with retrieval-aware constraints). This lets me test and route independently without redeploying the client.

The part that actually made it better: a feedback loop

The first version was anecdotal. I'd notice a weak answer, edit something, and hope for the best.

The current version records structured events for every question, answer, and thumbs-up/thumbs-down interaction — all linked by stable event IDs. That lets me triage a specific low-rated answer with its exact question text, citation count, latency, and backend mode instead of debugging in the abstract.

Triage classifies failures into four buckets:

  • Corpus gap — the content just isn't there
  • Retrieval/ranking issue — the right content exists but isn't surfaced
  • Prompt/format issue — generation quality or response clarity
  • Out-of-scope — the question type needs routing or a guardrail

Changes are tested, compared against a baseline, and only promoted when they improve answer quality without regressing citation clarity.

Rate limiting at the edge

Rate limiting is enforced in the Cloudflare Worker before any chat execution. Client identity is derived from a one-way hash of request context (IP + origin + user-agent) — raw IPs aren't stored as persistent identifiers.

Two guards run in sequence: an hourly quota and a burst interval. If either is exceeded, the API returns 429 with Retry-After. If KV storage is unavailable, the limiter degrades gracefully (fail-open) to preserve availability.

What I'd do next

  • Tighten citation quality metrics and regression gating for high-frequency questions
  • Promote successful A/B retrieval/prompt variants into default behavior
  • Expand corpus gap-closing using the weekly triage workflow

Try AskRich →

Ask it about architecture decisions, migration strategy, or platform delivery outcomes.

Full write-up with architecture diagrams and implementation detail on my site.

Top comments (0)