<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: P AASHISH </title>
    <description>The latest articles on DEV Community by P AASHISH  (@p_aashish1dt24ec069_3000).</description>
    <link>https://dev.to/p_aashish1dt24ec069_3000</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F3840543%2Faf0b20ba-1c84-4660-8e6a-39e3c6b22bc3.png</url>
      <title>DEV Community: P AASHISH </title>
      <link>https://dev.to/p_aashish1dt24ec069_3000</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/p_aashish1dt24ec069_3000"/>
    <language>en</language>
    <item>
      <title>How I structured logs around Hindsight</title>
      <dc:creator>P AASHISH </dc:creator>
      <pubDate>Mon, 23 Mar 2026 17:24:00 +0000</pubDate>
      <link>https://dev.to/p_aashish1dt24ec069_3000/how-i-structured-logs-around-hindsight-55b1</link>
      <guid>https://dev.to/p_aashish1dt24ec069_3000/how-i-structured-logs-around-hindsight-55b1</guid>
      <description>&lt;p&gt;“Why did it reject a perfect resume?” I dug into the logs and realized Hindsight had quietly rewritten the agent’s scoring logic based on one bad feedback loop.&lt;br&gt;
job sense ai&lt;/p&gt;

&lt;p&gt;“Why did it reject a perfect resume?” I dug into the logs and realized Hindsight had quietly rewritten the agent’s scoring logic based on one bad feedback loop.&lt;/p&gt;

&lt;p&gt;What I actually built&lt;/p&gt;

&lt;p&gt;This project is a job matching agent that reads resumes, scores candidates, and ranks them against job descriptions. Nothing fancy on the surface: parse resume → extract features → score → return top candidates.&lt;/p&gt;

&lt;p&gt;The interesting part is that the scoring logic isn’t fixed.&lt;/p&gt;

&lt;p&gt;I wired it up with Hindsight GitHub repository so the agent could learn from feedback—things like:&lt;/p&gt;

&lt;p&gt;“This candidate should have been ranked higher”&lt;/p&gt;

&lt;p&gt;“This profile is irrelevant despite keyword match”&lt;/p&gt;

&lt;p&gt;Instead of retraining a model, I let the agent adapt its behavior by replaying past decisions and corrections.&lt;/p&gt;

&lt;p&gt;How the system is structured&lt;/p&gt;

&lt;p&gt;At a high level, the code splits into three parts:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Resume ingestion + parsing&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Scoring pipeline&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Hindsight-backed memory + feedback loop&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;The scoring pipeline looks roughly like this:&lt;/p&gt;

&lt;p&gt;def score_candidate(resume, job_description):&lt;br&gt;
    features = extract_features(resume, job_description)&lt;br&gt;
    base_score = weighted_score(features)&lt;br&gt;
    adjustments = hindsight_adjustments(resume, job_description)&lt;br&gt;
    return base_score + adjustments&lt;/p&gt;

&lt;p&gt;The key is that hindsight_adjustments isn’t static. It’s derived from past feedback stored and replayed through Hindsight.&lt;/p&gt;

&lt;p&gt;Feedback events are stored with context:&lt;/p&gt;

&lt;p&gt;event = {&lt;br&gt;
    "resume_id": resume.id,&lt;br&gt;
    "job_id": job.id,&lt;br&gt;
    "original_score": score,&lt;br&gt;
    "feedback": "should_rank_higher",&lt;br&gt;
    "timestamp": now()&lt;br&gt;
}&lt;/p&gt;

&lt;p&gt;These get indexed and replayed later when similar candidates show up.&lt;/p&gt;

&lt;p&gt;If you’ve read the Hindsight documentation, this is basically using event replay as a lightweight learning layer instead of retraining.&lt;/p&gt;

&lt;p&gt;The bug that made this interesting&lt;/p&gt;

&lt;p&gt;Everything seemed fine until I noticed something weird:&lt;/p&gt;

&lt;p&gt;A strong candidate—clean experience, perfect keyword match—was consistently ranked low.&lt;/p&gt;

&lt;p&gt;At first I thought:&lt;/p&gt;

&lt;p&gt;parsing bug?&lt;/p&gt;

&lt;p&gt;feature extraction issue?&lt;/p&gt;

&lt;p&gt;bad weights?&lt;/p&gt;

&lt;p&gt;Nope.&lt;/p&gt;

&lt;p&gt;It was Hindsight.&lt;/p&gt;

&lt;p&gt;What actually happened&lt;/p&gt;

&lt;p&gt;One recruiter had marked a similar resume as “not relevant” earlier. That feedback got stored and replayed.&lt;/p&gt;

&lt;p&gt;But the similarity match was too broad.&lt;/p&gt;

&lt;p&gt;So now:&lt;/p&gt;

&lt;p&gt;New candidate comes in&lt;/p&gt;

&lt;p&gt;Hindsight finds a “similar” past event&lt;/p&gt;

&lt;p&gt;Applies a negative adjustment&lt;/p&gt;

&lt;p&gt;Score drops silently&lt;/p&gt;

&lt;p&gt;No logs screamed “this is wrong.” It just looked like the system “decided” differently.&lt;/p&gt;

&lt;p&gt;Debugging the feedback loop&lt;/p&gt;

&lt;p&gt;I had to explicitly trace how Hindsight was influencing decisions.&lt;/p&gt;

&lt;p&gt;I added logging like this:&lt;/p&gt;

&lt;p&gt;def hindsight_adjustments(resume, job):&lt;br&gt;
    events = hindsight.retrieve_similar(resume, job)&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;for e in events:
    print("Replaying event:", e)

return aggregate_adjustments(events)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;

&lt;p&gt;That’s when it clicked:&lt;/p&gt;

&lt;p&gt;The system wasn’t wrong&lt;/p&gt;

&lt;p&gt;It was too eager to generalize&lt;/p&gt;

&lt;p&gt;The feedback loop had effectively created a soft rule:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Candidates like this are bad”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;…based on a single data point.&lt;/p&gt;

&lt;p&gt;Fixing it without killing learning&lt;/p&gt;

&lt;p&gt;I didn’t want to remove Hindsight—it was the whole point.&lt;/p&gt;

&lt;p&gt;Instead, I constrained it.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Tightened similarity matching&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Instead of loose matching, I added stricter filters:&lt;/p&gt;

&lt;p&gt;if similarity_score &amp;lt; 0.85:&lt;br&gt;
    continue&lt;/p&gt;

&lt;p&gt;This alone reduced a lot of bad carryover.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Weighted feedback by frequency&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;One-off feedback shouldn’t dominate:&lt;/p&gt;

&lt;p&gt;adjustment = feedback_weight * log(1 + occurrence_count)&lt;/p&gt;

&lt;p&gt;Now repeated signals matter more than isolated ones.&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Scoped feedback by job context&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;A candidate rejected for one role shouldn’t be penalized globally.&lt;/p&gt;

&lt;p&gt;So I started indexing feedback like:&lt;/p&gt;

&lt;p&gt;key = (job_role, skill_cluster)&lt;/p&gt;

&lt;p&gt;instead of just resume similarity.&lt;/p&gt;

&lt;p&gt;Before vs after&lt;/p&gt;

&lt;p&gt;Before:&lt;/p&gt;

&lt;p&gt;One bad feedback → affects many future candidates&lt;/p&gt;

&lt;p&gt;Silent score shifts&lt;/p&gt;

&lt;p&gt;Hard to debug&lt;/p&gt;

&lt;p&gt;After:&lt;/p&gt;

&lt;p&gt;Feedback only applies in tight contexts&lt;/p&gt;

&lt;p&gt;Multiple signals required to shift behavior&lt;/p&gt;

&lt;p&gt;Logs clearly show why scores change&lt;/p&gt;

&lt;p&gt;Now when a candidate is penalized, I can point to:&lt;/p&gt;

&lt;p&gt;specific past events&lt;/p&gt;

&lt;p&gt;similarity threshold&lt;/p&gt;

&lt;p&gt;adjustment weight&lt;/p&gt;

&lt;p&gt;What Hindsight actually gave me&lt;/p&gt;

&lt;p&gt;The biggest shift wasn’t accuracy—it was behavior.&lt;/p&gt;

&lt;p&gt;Without Hindsight:&lt;/p&gt;

&lt;p&gt;The agent is static&lt;/p&gt;

&lt;p&gt;Bugs are code bugs&lt;/p&gt;

&lt;p&gt;With Hindsight:&lt;/p&gt;

&lt;p&gt;The agent evolves&lt;/p&gt;

&lt;p&gt;Bugs become behavioral drift&lt;/p&gt;

&lt;p&gt;That’s a very different debugging problem.&lt;/p&gt;

&lt;p&gt;If you’re curious, the concept is explained well in this agent memory overview on Vectorize.&lt;/p&gt;

&lt;p&gt;A concrete example&lt;/p&gt;

&lt;p&gt;A recruiter gives feedback:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“This candidate looks good on paper but lacks real project depth.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;That gets stored.&lt;/p&gt;

&lt;p&gt;Later, a similar resume comes in:&lt;/p&gt;

&lt;p&gt;same keywords&lt;/p&gt;

&lt;p&gt;similar experience level&lt;/p&gt;

&lt;p&gt;The system:&lt;/p&gt;

&lt;p&gt;retrieves past feedback&lt;/p&gt;

&lt;p&gt;applies a small negative adjustment&lt;/p&gt;

&lt;p&gt;slightly lowers rank&lt;/p&gt;

&lt;p&gt;After multiple similar feedback events, the agent starts implicitly learning:&lt;/p&gt;

&lt;blockquote&gt;
&lt;p&gt;“Keyword match isn’t enough—depth matters.”&lt;/p&gt;
&lt;/blockquote&gt;

&lt;p&gt;No retraining. Just accumulated corrections.&lt;/p&gt;

&lt;p&gt;What I learned&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;&lt;p&gt;Feedback loops are brittle by default&lt;br&gt;
One bad signal can poison future decisions if you don’t gate it carefully.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Similarity is everything&lt;br&gt;
If your retrieval is loose, your learning is noisy. Tightening similarity improved behavior more than anything else.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Logging matters more than modeling&lt;br&gt;
I didn’t change the scoring model much. I just made Hindsight visible.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;Local context beats global memory&lt;br&gt;
Scoping feedback to job role + skill cluster made the system far more stable.&lt;/p&gt;&lt;/li&gt;
&lt;li&gt;&lt;p&gt;“Learning” is just controlled bias accumulation&lt;br&gt;
Hindsight doesn’t magically learn—it just accumulates past decisions. Your job is to control how that bias spreads.&lt;/p&gt;&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;Would I do this again?&lt;/p&gt;

&lt;p&gt;Yes—but with guardrails from day one.&lt;/p&gt;

&lt;p&gt;Hindsight is powerful, but it will happily amplify your mistakes if you let it.&lt;/p&gt;

&lt;p&gt;If you treat it like:&lt;/p&gt;

&lt;p&gt;a suggestion system (not ground truth)&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl3a7pcn07fxh7m9s495s.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Fl3a7pcn07fxh7m9s495s.jpg" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;a contextual memory (not global truth)&lt;/p&gt;

&lt;p&gt;…it becomes a practical way to make agents adapt without retraining.&lt;/p&gt;

&lt;p&gt;Otherwise, you’ll end up debugging why your system rejected a perfect resume—and realizing it was your own feedback loop all along.&lt;/p&gt;

&lt;p&gt;&lt;a href="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftdl5esf1kvpythycxe91.jpg" class="article-body-image-wrapper"&gt;&lt;img src="https://media2.dev.to/dynamic/image/width=800%2Cheight=%2Cfit=scale-down%2Cgravity=auto%2Cformat=auto/https%3A%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Farticles%2Ftdl5esf1kvpythycxe91.jpg" alt=" " width="800" height="450"&gt;&lt;/a&gt;&lt;/p&gt;

</description>
      <category>ai</category>
      <category>webdev</category>
      <category>beginners</category>
      <category>tutorial</category>
    </item>
  </channel>
</rss>
