<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Prateek Puri</title>
    <description>The latest articles on DEV Community by Prateek Puri (@prateek_puri_a081b16b1c2d).</description>
    <link>https://dev.to/prateek_puri_a081b16b1c2d</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3953025%2Fd6796593-c29a-400b-bcd3-9127343d60d8.png</url>
      <title>DEV Community: Prateek Puri</title>
      <link>https://dev.to/prateek_puri_a081b16b1c2d</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/prateek_puri_a081b16b1c2d"/>
    <language>en</language>
    <item>
      <title>Why I got frustrated with AI job search tools and built my own</title>
      <dc:creator>Prateek Puri</dc:creator>
      <pubDate>Tue, 26 May 2026 19:14:40 +0000</pubDate>
      <link>https://dev.to/prateek_puri_a081b16b1c2d/why-i-got-frustrated-with-ai-job-search-tools-and-built-my-own-11ja</link>
      <guid>https://dev.to/prateek_puri_a081b16b1c2d/why-i-got-frustrated-with-ai-job-search-tools-and-built-my-own-11ja</guid>
      <description>&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fprateekpuri01%2Fmirror%2Fmain%2Fassets%2Fmirror-demo.gif" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fraw.githubusercontent.com%2Fprateekpuri01%2Fmirror%2Fmain%2Fassets%2Fmirror-demo.gif" alt="Mirror demo" width="559" height="310"&gt;&lt;/a&gt;&lt;br&gt;
A few months ago I built an AI resume tool to help with my own job search.&lt;br&gt;
The first version worked — it scraped jobs, scored them, and could&lt;br&gt;
generate a tailored resume per posting in about thirty seconds. I shipped&lt;br&gt;
it. I used it.&lt;/p&gt;

&lt;p&gt;After two weeks I noticed something: I was rewriting the same paragraphs&lt;br&gt;
over and over. Every resume I generated for a research role would have the same generic, third-person, marketing-flavored. I'd rewrite&lt;br&gt;
it be active, voice-led, the way I actually talk about it. Then I'd do it again on the next resume. And the next. The system had no idea I'd already&lt;br&gt;
fixed this exact paragraph fifteen times.&lt;/p&gt;

&lt;p&gt;I'd built a memory layer for this. It wasn't working.&lt;/p&gt;

&lt;p&gt;This post is about why, and the architecture I ended up with after&lt;br&gt;
deleting most of v1.&lt;/p&gt;
&lt;h2&gt;
  
  
  The original architecture
&lt;/h2&gt;

&lt;p&gt;The system had three things:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;
&lt;strong&gt;A single-shot resume generator&lt;/strong&gt; — one big LLM call that produced
the entire resume given the user's profile, the target job, and a
strategic plan.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A &lt;code&gt;writing_memory&lt;/code&gt; table&lt;/strong&gt; that stored "writing rules" extracted
from edit diffs by another LLM call.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;A confidence-scored injection layer&lt;/strong&gt; that pulled rules above 0.6
confidence into future generation prompts.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The flow looked roughly like this:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;User edits → background task feeds (before, after) to a classifier&lt;br&gt;
LLM → classifier returns &lt;code&gt;{rule_text: "use 'built' instead of&lt;br&gt;
'utilized'", category: "word_choice", confidence: 0.5}&lt;/code&gt; → rule sits&lt;br&gt;
at confidence 0.5 in the database → after 3 reinforcements the rule&lt;br&gt;
crosses 0.6 and starts appearing in future prompts.&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is a perfectly reasonable design. It just doesn't address the&lt;br&gt;
actual problem.&lt;/p&gt;
&lt;h2&gt;
  
  
  Four reasons it underfit
&lt;/h2&gt;

&lt;p&gt;&lt;strong&gt;1. Style rules need repetition to surface.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The 0.6 confidence threshold required ~3 reinforcements before a rule&lt;br&gt;
fired. A user who corrected "leveraged" → "used" once and didn't get a&lt;br&gt;
chance to do it twice more saw the rule sit at 0.5 and never appear in&lt;br&gt;
prompts. The system felt forgetful even when it had captured the&lt;br&gt;
right signal.&lt;/p&gt;

&lt;p&gt;I could have lowered the threshold, but lowering it makes the&lt;br&gt;
classifier's noise dominate.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;2. Style rules are the wrong abstraction.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Even when a rule fired correctly ("avoid passive constructions"), it&lt;br&gt;
couldn't capture the user's preferred &lt;em&gt;phrasing of a specific&lt;br&gt;
accomplishment&lt;/em&gt;. My hand-tuned MUSE description from the Cohere&lt;br&gt;
application — concrete, voice-matched, exactly the phrasing I wanted&lt;br&gt;
— was thrown away and re-generated from scratch every time I&lt;br&gt;
generated a new resume. The LLM had no way to know that text existed.&lt;/p&gt;

&lt;p&gt;This is the difference between &lt;em&gt;"the user prefers active verbs"&lt;/em&gt; (a&lt;br&gt;
style rule) and &lt;em&gt;"here's the exact paragraph the user wrote about MUSE&lt;br&gt;
last time, and the time before that"&lt;/em&gt; (per-entity content). The second&lt;br&gt;
one is what I actually wanted.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;3. Single-shot generation papered over redundancy.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The pre-rewrite generation prompt told the LLM:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"If an accomplishment appears in both &lt;code&gt;selected_research&lt;/code&gt; and an&lt;br&gt;
experience bullet, the bullet MUST say something completely&lt;br&gt;
different."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;This is an instruction. LLMs follow instructions inconsistently. Every&lt;br&gt;
few generations, the bullet would just paraphrase the research&lt;br&gt;
description for the same accomplishment, and I'd have to fix it&lt;br&gt;
manually. Not a bug — the LLM was doing the best it could with a&lt;br&gt;
soft constraint.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;4. The classifier was noisy.&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;The extraction LLM happily generated rules from any edit, including&lt;br&gt;
one-off content rewrites that weren't generalizable. Rules like&lt;br&gt;
&lt;em&gt;"Replace 'qualitative research at RAND' with 'manual qualitative&lt;br&gt;
research workflow'"&lt;/em&gt; would show up — that's a content edit, not a style&lt;br&gt;
rule, but the classifier didn't know the difference.&lt;/p&gt;
&lt;h2&gt;
  
  
  The mental model that emerged
&lt;/h2&gt;

&lt;p&gt;After staring at the failure modes for a while, I realized two memory&lt;br&gt;
needs were collapsing into one layer:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Layer&lt;/th&gt;
&lt;th&gt;Question it answers&lt;/th&gt;
&lt;th&gt;Granularity&lt;/th&gt;
&lt;th&gt;When it helps&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Style&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"What's the user's general voice?"&lt;/td&gt;
&lt;td&gt;Whole-resume&lt;/td&gt;
&lt;td&gt;Cold start — no prior version of this entity exists yet&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;strong&gt;Content&lt;/strong&gt;&lt;/td&gt;
&lt;td&gt;"What's the user's preferred phrasing for &lt;em&gt;this&lt;/em&gt; accomplishment / employer / skill bucket?"&lt;/td&gt;
&lt;td&gt;Per-entity&lt;/td&gt;
&lt;td&gt;Warm — user has touched this entity before&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;&lt;code&gt;writing_memory&lt;/code&gt; only addressed the first. There was no system for the&lt;br&gt;
second — and that's where most of the frustration lived.&lt;/p&gt;
&lt;h2&gt;
  
  
  Two-tier memory
&lt;/h2&gt;

&lt;p&gt;I added a new table, &lt;code&gt;content_memory&lt;/code&gt;, that stores the user's&lt;br&gt;
hand-tuned final text &lt;strong&gt;per entity&lt;/strong&gt;, keyed on the underlying domain&lt;br&gt;
object:&lt;/p&gt;

&lt;div class="table-wrapper-paragraph"&gt;&lt;table&gt;
&lt;thead&gt;
&lt;tr&gt;
&lt;th&gt;Entity type&lt;/th&gt;
&lt;th&gt;Key&lt;/th&gt;
&lt;th&gt;What's stored&lt;/th&gt;
&lt;/tr&gt;
&lt;/thead&gt;
&lt;tbody&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;research_description&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;&lt;code&gt;accomplishment_id&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The user's prose for that research entry&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;experience_bullets_set&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;employer_key&lt;/code&gt; (e.g. &lt;code&gt;rand_corporation&lt;/code&gt;)&lt;/td&gt;
&lt;td&gt;The full final array of bullets for that employer&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;&lt;code&gt;skill_bucket&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;
&lt;code&gt;ai_systems&lt;/code&gt; / &lt;code&gt;data_science&lt;/code&gt; / &lt;code&gt;engineering&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;The user's curated comma-separated list&lt;/td&gt;
&lt;/tr&gt;
&lt;tr&gt;
&lt;td&gt;
&lt;code&gt;summary&lt;/code&gt; / &lt;code&gt;tagline&lt;/code&gt;
&lt;/td&gt;
&lt;td&gt;&lt;code&gt;__scalar__&lt;/code&gt;&lt;/td&gt;
&lt;td&gt;The user's text&lt;/td&gt;
&lt;/tr&gt;
&lt;/tbody&gt;
&lt;/table&gt;&lt;/div&gt;

&lt;p&gt;The unique constraint is &lt;code&gt;(entity_type, entity_key, source_doc_id)&lt;/code&gt; —&lt;br&gt;
that combination matters. &lt;strong&gt;Within one resume&lt;/strong&gt;, every edit on the same&lt;br&gt;
entity overwrites the same row. So a session of three FINRA-bullet edits&lt;br&gt;
collapses to a single final-state record, not three diff records.&lt;br&gt;
&lt;strong&gt;Across resumes&lt;/strong&gt;, the same entity (say, &lt;code&gt;rand-muse&lt;/code&gt; research description)&lt;br&gt;
accumulates a row per resume.&lt;/p&gt;

&lt;p&gt;That accumulated history is the corpus the agent grounds on at&lt;br&gt;
generation time. After ingesting twelve past resumes, my MUSE&lt;br&gt;
accomplishment had eleven different hand-tuned versions in&lt;br&gt;
&lt;code&gt;content_memory&lt;/code&gt;, each tagged with the role I wrote it for:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Anthropic Research Engineer: &lt;em&gt;"Designed and built an internal
human-AI research platform…"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Cohere Lead Data Scientist: &lt;em&gt;"Replaced a manual qualitative research
workflow with an AI-assisted system…"&lt;/em&gt;
&lt;/li&gt;
&lt;li&gt;Cohere MTS Data Analysis: &lt;em&gt;"Designed and developed a platform to
replace largely manual…"&lt;/em&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;These don't get copied verbatim into new resumes. They go into the&lt;br&gt;
prompt as &lt;strong&gt;soft grounding&lt;/strong&gt;:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;## Your past hand-tuned versions for this content
Use these as grounding for tone, phrasing, and emphasis. Do NOT copy
verbatim — adapt to the current job context.

### Most recent (written for: Senior ML Engineer @ Anthropic)
Designed and built an internal human-AI research platform...

### Earlier (written for: Lead Data Scientist @ Cohere)
Replaced a manual qualitative research workflow...
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;The agent now sees, for any given entity, &lt;em&gt;the user's actual voice&lt;br&gt;
across multiple roles&lt;/em&gt;, with the role context attached. New&lt;br&gt;
generations adapt content to the new role but inherit voice from the&lt;br&gt;
corpus. That single change made the second generation of any given&lt;br&gt;
resume feel dramatically more like me.&lt;/p&gt;

&lt;p&gt;&lt;code&gt;writing_memory&lt;/code&gt; stayed — but as a &lt;strong&gt;fallback&lt;/strong&gt; signal that handles&lt;br&gt;
genuinely abstract style preferences (banned words, sentence-form&lt;br&gt;
rules) for entities that don't have content_memory rows yet. Its&lt;br&gt;
extraction was narrowed to only fire on summary, tagline, and skills&lt;br&gt;
edits, since those are the paths where abstract style is actually the&lt;br&gt;
right granularity.&lt;/p&gt;
&lt;h2&gt;
  
  
  Staged generation, with structural cross-section dedup
&lt;/h2&gt;

&lt;p&gt;Memory was half the problem. The other half was that single-shot&lt;br&gt;
generation made the LLM coordinate five sections in one go — and the&lt;br&gt;
"same accomplishment can't appear in both research and bullets"&lt;br&gt;
constraint was an instruction it would honor maybe 80% of the time.&lt;/p&gt;

&lt;p&gt;I split the single shot into a pipeline:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;1. Strategic plan
2. Selection (which accomplishments / employers go into the resume)
3. Parallel: research entries + skill buckets + publications selection
4. Parallel: bullets per employer — receives the FINALIZED research
   entries as part of its input
5. Critic — audits the assembled draft and flags specific issues
   per entity
6. Refiner — re-runs ONLY flagged entities with critic notes
7. Summary + tagline — synthesizes the post-refiner draft
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Stage 4 is the key change. The bullet generator literally receives the&lt;br&gt;
finalized research entries as part of its prompt, with explicit&lt;br&gt;
instruction: &lt;em&gt;"For any bullet whose accomplishment_id appears above,&lt;br&gt;
take a different angle."&lt;/em&gt; The cross-section dedup constraint became&lt;br&gt;
data-flow-driven instead of instruction-following — which means it&lt;br&gt;
actually holds.&lt;/p&gt;

&lt;p&gt;Stage 6 (refiner) was equally important. An earlier design had a&lt;br&gt;
critic that rewrote the whole draft. The problem: my hand-tuned&lt;br&gt;
phrasings (now memorized in &lt;code&gt;content_memory&lt;/code&gt; and used as grounding)&lt;br&gt;
would get silently rewritten by the critic, undoing the memory work.&lt;br&gt;
Targeted refinement preserves phrasings the critic doesn't flag.&lt;/p&gt;

&lt;p&gt;Every leaf LLM call's full prompt + response gets dumped to&lt;br&gt;
&lt;code&gt;output/traces/{trace_id}/{stage}.txt&lt;/code&gt;. When something comes out&lt;br&gt;
weird, the trace files are the audit trail.&lt;/p&gt;

&lt;h2&gt;
  
  
  The unexpected fight: voice mirroring
&lt;/h2&gt;

&lt;p&gt;After the pipeline split landed, I generated a fresh resume against&lt;br&gt;
the same Surge job. The research description came back as:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Qualitative research at my firm moved from a mostly manual workflow&lt;br&gt;
to a blended human-AI system that can support structured coding,&lt;br&gt;
thematic synthesis, and policy analysis at production scale."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;Passive. Subject-the-work, not subject-the-actor. Awkward.&lt;/p&gt;

&lt;p&gt;But the grounding block had shown the LLM eleven previous versions,&lt;br&gt;
all leading with active verbs. What was happening?&lt;/p&gt;

&lt;p&gt;The system prompt had this rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Sentence 1: the transformation — what's different now because&lt;br&gt;
this work exists."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;The model had resolved the conflict between &lt;em&gt;"transformation-led"&lt;/em&gt;&lt;br&gt;
and the grounding examples by going passive ("research moved from"),&lt;br&gt;
which technically honors transformation framing. The grounding was&lt;br&gt;
losing to the explicit rule.&lt;/p&gt;

&lt;p&gt;The fix was a single paragraph added to the leaf prompt:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"If a "Your past hand-tuned versions" block appears in the user&lt;br&gt;
message, those are the candidate's own prior phrasings of THIS SAME&lt;br&gt;
accomplishment. Treat them as the source of TRUTH for voice and&lt;br&gt;
style. MIRROR the opening verb structure. Do NOT switch to passive&lt;br&gt;
transformation framings — that is not the candidate's voice and&lt;br&gt;
will be rejected."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;After that, the same prompt produced:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;&lt;em&gt;"Designed and built a blended human-AI research platform for&lt;br&gt;
structured coding, thematic synthesis, and policy analysis that&lt;br&gt;
turned messy qualitative inputs into reliable, reviewable outputs..."&lt;/em&gt;&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That sounds like me writing about my work. The grounding finally won.&lt;/p&gt;

&lt;p&gt;This is the kind of thing that's hard to anticipate before you ship.&lt;br&gt;
You build the architecture, you wire up the data, and then a soft&lt;br&gt;
prompt rule and a hard data signal fight, and the soft rule wins&lt;br&gt;
because LLMs really do try to follow instructions.&lt;/p&gt;

&lt;h2&gt;
  
  
  Catching this in CI: semantic eval
&lt;/h2&gt;

&lt;p&gt;I didn't want the next prompt change I made to silently break voice&lt;br&gt;
mirroring again. That meant building an eval suite — but not the&lt;br&gt;
typical "did the function return the expected value" kind. Voice&lt;br&gt;
quality isn't unit-testable.&lt;/p&gt;

&lt;p&gt;I wrote a multi-turn eval that simulates a user clicking on a section,&lt;br&gt;
asking for a rewrite, then iterating with feedback. Three scenarios:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Tighten a bullet → tighten more → end with the scale figure.&lt;/li&gt;
&lt;li&gt;Rewrite a research description with explicit "lead with 'Replaced'"
feedback, then drop the duplicate last sentence.&lt;/li&gt;
&lt;li&gt;Clean up a skills bucket with cross-bucket dedup feedback.&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Each turn is graded by &lt;strong&gt;a separate LLM call&lt;/strong&gt; acting as judge,&lt;br&gt;
returning structured pass/fail per axis:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;code&gt;respects_instruction&lt;/code&gt; — did the rewrite actually do what the user
asked?&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;no_fabrication&lt;/code&gt; — are all facts in the new value present in the
underlying accomplishment data?&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;differs_from_prior&lt;/code&gt; — is the rewrite materially different from the
prior turn? (Trivial whitespace tweaks count as fail.)&lt;/li&gt;
&lt;li&gt;
&lt;code&gt;voice_matches_grounding&lt;/code&gt; — when past versions are shown, does the
new value mirror their opening-verb pattern?&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;The eval is in CI. PRs that touch prompts, the agent, or the memory&lt;br&gt;
layer trigger a workflow that boots a Postgres service container,&lt;br&gt;
seeds the DB from a fictional sample profile, runs the eval, and&lt;br&gt;
posts the per-turn judge output as a PR comment. The job fails if&lt;br&gt;
the aggregate pass rate drops below 80%.&lt;/p&gt;

&lt;p&gt;Current pass rate on the three scenarios: &lt;strong&gt;26 / 27 graded checks&lt;/strong&gt;.&lt;br&gt;
The one failure is a real finding the eval is supposed to catch — on a&lt;br&gt;
skill-bucket edit the agent silently added "Claude Code" (not in the&lt;br&gt;
user's whitelist). The judge correctly flagged "no_fabrication: fail."&lt;/p&gt;

&lt;p&gt;This is the kind of regression that would otherwise ship invisibly. Most&lt;br&gt;
LLM-app projects can't catch it. The eval is what lets me push prompt&lt;br&gt;
changes without holding my breath.&lt;/p&gt;

&lt;h2&gt;
  
  
  What this is and isn't
&lt;/h2&gt;

&lt;p&gt;It's tempting to call this "AI memory" and reach for vector embeddings.&lt;br&gt;
I didn't. Every entity here has a stable ID — &lt;code&gt;accomplishment_id&lt;/code&gt;,&lt;br&gt;
&lt;code&gt;employer_key&lt;/code&gt;, bucket name — and the right key is the domain object,&lt;br&gt;
not a similarity search. Embeddings would be the natural reach if&lt;br&gt;
we were doing fuzzy retrieval. We're not. We're doing exact lookup.&lt;/p&gt;

&lt;p&gt;It's also tempting to make the past versions a verbatim cache and&lt;br&gt;
just splice them in. I didn't do that either — past versions inform&lt;br&gt;
voice; they don't replace text. The split between &lt;strong&gt;soft grounding&lt;/strong&gt;&lt;br&gt;
(in prompts) and &lt;strong&gt;explicit "Past versions ▼" UI swap&lt;/strong&gt; (deterministic&lt;br&gt;
text replacement) matters. The user can adapt content to a new role&lt;br&gt;
while inheriting voice; they can also choose to slot in a past&lt;br&gt;
version verbatim if that's what they want. Two different mechanisms&lt;br&gt;
for two different needs.&lt;/p&gt;

&lt;p&gt;And it's not provider-locked. The whole stack runs against OpenAI by&lt;br&gt;
default and against Anthropic / Ollama via their OpenAI-compat&lt;br&gt;
endpoints. The factory is ~50 lines. Most of the design carries&lt;br&gt;
over to any chat-completion API.&lt;/p&gt;

&lt;h2&gt;
  
  
  What I'd do differently
&lt;/h2&gt;

&lt;p&gt;A few things I'd revisit if I were starting over:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;
&lt;strong&gt;Bullet-set memory at the bullet level.&lt;/strong&gt; Today an entire employer's
bullet array is one row, keyed on &lt;code&gt;employer_key&lt;/code&gt;. Adding one new
bullet overwrites the whole row. Synthetic stable bullet_ids would
let unchanged bullets persist across edits.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Per-role-family scoping.&lt;/strong&gt; The user might want different
memorized phrasings for ML-research vs. applied-AI roles. Today the
agent just sees the job_context JSON inline and pattern-matches.&lt;/li&gt;
&lt;li&gt;
&lt;strong&gt;Semantic eval earlier.&lt;/strong&gt; I built the eval suite &lt;em&gt;after&lt;/em&gt; I'd
shipped the staged pipeline. Several iterations of prompt-tuning
could have been validated mechanically instead of by re-reading
generations and squinting.&lt;/li&gt;
&lt;/ul&gt;

&lt;h2&gt;
  
  
  Code
&lt;/h2&gt;

&lt;p&gt;If you want to read the implementation:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Memory schema: &lt;a href="https://github.com/prateekpuri01/mirror/blob/main/backend/app/models/content_memory.py" rel="noopener noreferrer"&gt;&lt;code&gt;backend/app/models/content_memory.py&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Capture path: &lt;a href="https://github.com/prateekpuri01/mirror/blob/main/backend/app/routers/documents.py" rel="noopener noreferrer"&gt;&lt;code&gt;_learn_from_inline_edit&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Generation pipeline: &lt;a href="https://github.com/prateekpuri01/mirror/blob/main/backend/app/ai/resume_pipeline.py" rel="noopener noreferrer"&gt;&lt;code&gt;backend/app/ai/resume_pipeline.py&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Eval suite: &lt;a href="https://github.com/prateekpuri01/mirror/blob/main/backend/scripts/eval/eval_focused_edit.py" rel="noopener noreferrer"&gt;&lt;code&gt;backend/scripts/eval/eval_focused_edit.py&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;li&gt;Full design doc: &lt;a href="https://github.com/prateekpuri01/mirror/blob/main/docs/MEMORY_DESIGN.md" rel="noopener noreferrer"&gt;&lt;code&gt;docs/MEMORY_DESIGN.md&lt;/code&gt;&lt;/a&gt;
&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;Mirror is MIT-licensed. It's designed for local single-user deployment&lt;br&gt;
— no auth, no hosted service. If you have a different application&lt;br&gt;
where you'd want to remember a user's voice across sessions, the&lt;br&gt;
architecture transfers cleanly: stable entity IDs, soft grounding, a&lt;br&gt;
critic that flags rather than rewrites, and an eval suite that catches&lt;br&gt;
regressions that unit tests can't see.&lt;/p&gt;

&lt;p&gt;— &lt;em&gt;Prateek&lt;/em&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>opensource</category>
      <category>machinelearning</category>
      <category>programming</category>
    </item>
  </channel>
</rss>
