<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: jaberoma</title>
    <description>The latest articles on DEV Community by jaberoma (@jaberoma_77).</description>
    <link>https://dev.to/jaberoma_77</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3949471%2Fde8115ba-a83e-4d6f-93ae-b8be1787f614.png</url>
      <title>DEV Community: jaberoma</title>
      <link>https://dev.to/jaberoma_77</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/jaberoma_77"/>
    <language>en</language>
    <item>
      <title>I built an AI résumé tool that refuses to lie about your experience</title>
      <dc:creator>jaberoma</dc:creator>
      <pubDate>Mon, 25 May 2026 17:53:13 +0000</pubDate>
      <link>https://dev.to/jaberoma_77/i-built-an-ai-resume-tool-that-refuses-to-lie-about-your-experience-n7b</link>
      <guid>https://dev.to/jaberoma_77/i-built-an-ai-resume-tool-that-refuses-to-lie-about-your-experience-n7b</guid>
      <description>&lt;p&gt;Most AI résumé tools have the same flaw: they hallucinate. Ask them to tailor your résumé for a job requiring "Rust experience" and they'll happily invent a Rust project you never worked on. It reads great — until the technical interview.&lt;/p&gt;

&lt;p&gt;I wanted the opposite. So I built &lt;strong&gt;Citevault&lt;/strong&gt;: a local-first résumé tailoring tool where every claim is either grounded in your own evidence, or refused and flagged as a gap.&lt;/p&gt;

&lt;p&gt;No fabrication. No API keys. Runs entirely on your laptop. &lt;em&gt;(Model weights are pulled from Hugging Face once on first boot; after that, no outbound connections.)&lt;/em&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The core idea: claim-level grounding
&lt;/h2&gt;

&lt;p&gt;Every bullet in your résumé starts as a &lt;em&gt;claim&lt;/em&gt;. Citevault processes each one through a pipeline:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;Retrieve&lt;/strong&gt; — hybrid BM25 + dense embedding search over your indexed evidence (master résumé, project READMEs, blog posts, anything you upload)&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Re-rank&lt;/strong&gt; — BGE cross-encoder scores the top candidates for relevance&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Verify&lt;/strong&gt; — Gemma 4 reads the claim alongside the retrieved span and gives a verdict: &lt;code&gt;SUPPORTS&lt;/code&gt;, &lt;code&gt;PARTIAL&lt;/code&gt;, &lt;code&gt;UNCLEAR&lt;/code&gt;, or &lt;code&gt;CONTRADICTS&lt;/code&gt;
&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Rewrite or refuse&lt;/strong&gt; — &lt;code&gt;SUPPORTS&lt;/code&gt; → the claim is verified and cited; &lt;code&gt;PARTIAL&lt;/code&gt; → rewritten to match only what the evidence actually says; &lt;code&gt;UNCLEAR&lt;/code&gt; → a rewrite is attempted, and if it still can't be grounded, refused and gap-reported; &lt;code&gt;CONTRADICTS&lt;/code&gt; → refused immediately and gap-reported&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdgpriqdq44x8nz37f1ih.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fdgpriqdq44x8nz37f1ih.png" alt="Tailoring running — real-time SSE stream showing retrieval and SUPPORTS verdicts as each claim is verified" width="799" height="597"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;The result is a résumé where every bullet has a &lt;code&gt;[^sp-...]&lt;/code&gt; footnote traceable back to a specific span in your source material.&lt;/p&gt;




&lt;h2&gt;
  
  
  The wow demo: Naive Comparison Mode
&lt;/h2&gt;

&lt;p&gt;Toggle "Compare with naive AI" before starting a tailoring run. Citevault runs its grounded pipeline &lt;em&gt;and&lt;/em&gt; a second single-pass run — same model, same evidence, same task description, no verification loop. The only difference is the grounded pipeline checks every claim against its source before including it.&lt;/p&gt;

&lt;p&gt;The diff is striking:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Grounded résumé:&lt;/strong&gt; seven bullets, every one backed by a citation footnote traceable to a source span&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Naive résumé:&lt;/strong&gt; longer, more confident-sounding — and full of placeholders like &lt;code&gt;[Candidate Name]&lt;/code&gt; and invented achievements that never appeared in the evidence&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgi1e7487fe0qrw4v8g9g.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fgi1e7487fe0qrw4v8g9g.png" alt="Tailoring result — verified grounded claims with citation footnotes; every bullet traces back to a source span" width="800" height="511"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff27qdmxwaujayos30igp.png" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ff27qdmxwaujayos30igp.png" alt="Diff view — Citevault grounded output (left) vs naive AI ungrounded output (right); the naive résumé invents structure and fills in placeholders the model fabricated" width="800" height="563"&gt;&lt;/a&gt;&lt;/p&gt;




&lt;h2&gt;
  
  
  The AI stack (all local, no API keys)
&lt;/h2&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Component&lt;/th&gt;
&lt;th&gt;Role&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;strong&gt;Gemma 4 E4B&lt;/strong&gt; (&lt;code&gt;gemma4:e4b&lt;/code&gt;) via Ollama&lt;/td&gt;
&lt;td&gt;Claim drafting, verification, cover letter composition&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BGE-small-en-v1.5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Dense embeddings for semantic retrieval&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BGE cross-encoder&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Re-ranking retrieved candidates&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;BM25 + SQLite FTS5&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Keyword retrieval (hybrid RAG)&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;sqlite-vec&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;Vector store — no external database required&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;Gemma 4 E4B was chosen specifically for this role: it is instruction-tuned well enough to return consistent structured JSON verdicts, small enough to run on CPU without a GPU, and open-weight so no API key or data exposure is involved. The &lt;code&gt;e4b&lt;/code&gt; tag is the Q4_K_M quantised build — the best size/quality tradeoff for local inference via Ollama.&lt;/p&gt;

&lt;p&gt;The entire stack runs on CPU. Measured on a 4-core/8-thread laptop with 32 GB RAM and no discrete GPU: 3–8 tokens/second generation speed, 20–30 minutes per tailoring run; add another 10–20 minutes if naive comparison is enabled. Slower than a cloud API, but zero cost, zero data exposure, and no dependency on an upstream service staying alive.&lt;/p&gt;




&lt;h2&gt;
  
  
  What I learned building this
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;Structured generation is the hard part.&lt;/strong&gt; Getting Gemma 4 to consistently return structured JSON verdicts from the verifier took more prompt iteration than anything else. The final verifier prompt is tightly constrained: it gives the model a specific rubric, a strict output format, and a worked example. It still occasionally returns malformed output — those claims are logged and omitted from the output rather than silently passed through.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Hybrid RAG matters.&lt;/strong&gt; Pure dense search misses exact keyword matches. Pure BM25 misses semantic similarity. On the five-case golden eval set, the hybrid combination recovered ~15 percentage points in first-pass grounding rate over either retrieval strategy alone — enough to tip borderline claims from UNCLEAR to SUPPORTS.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Eval-driven development pays off.&lt;/strong&gt; I built a golden evaluation set of five synthetic candidates and ran the pipeline against it after every significant change. The final first-pass grounding rate is 98.2% — but more importantly, I caught two regressions that looked fine in manual testing.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Local-first is a real constraint, not a marketing line.&lt;/strong&gt; Your career data is sensitive. Résumés contain salary history, reasons for leaving, private project details. I didn't want to be a data controller. Building local-first forced specific architectural decisions — no cloud storage, no async job queue, no third-party embedding API.&lt;/p&gt;




&lt;h2&gt;
  
  
  Try it
&lt;/h2&gt;



&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight shell"&gt;&lt;code&gt;docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt; ollama
docker compose &lt;span class="nb"&gt;exec &lt;/span&gt;ollama ollama pull gemma4:e4b
docker compose up &lt;span class="nt"&gt;-d&lt;/span&gt;
&lt;span class="c"&gt;# Then open http://localhost:5173/admin in your browser&lt;/span&gt;
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Upload your evidence, paste a job posting, and watch the grounding happen in real time via SSE stream.&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;strong&gt;Heads up — this runs on CPU.&lt;/strong&gt; On a 4-core laptop without a GPU, expect 20–30 minutes per tailoring run. With naive comparison enabled, add another 10–20 minutes for the second pass. It is slow by cloud-API standards, but fully offline and costs nothing after the first model pull.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The best test: pick a role where you have a genuine skill gap — that is where the gap report is most useful.&lt;/p&gt;

&lt;p&gt;The full architecture (hexagonal layout, RAG pipeline, Docker Compose stack) is documented in &lt;a href="https://github.com/jaberoma/citevault/blob/main/docs/architecture.md" rel="noopener noreferrer"&gt;&lt;code&gt;docs/architecture.md&lt;/code&gt;&lt;/a&gt; in the repo.&lt;/p&gt;

&lt;p&gt;The code is on GitHub: &lt;strong&gt;&lt;a href="https://github.com/jaberoma/citevault" rel="noopener noreferrer"&gt;github.com/jaberoma/citevault&lt;/a&gt;&lt;/strong&gt; — MIT licensed, no account required, runs on any laptop with Docker.&lt;/p&gt;

&lt;p&gt;Citevault's contract is simple: every claim in your résumé either links to a source span in your own evidence, or it does not appear. No exceptions.&lt;/p&gt;

</description>
      <category>googleaisprint</category>
      <category>googlegemma</category>
      <category>gemmachallenge</category>
      <category>devchallenge</category>
    </item>
  </channel>
</rss>
