“Why did it reject a perfect resume?” I dug into the logs and realized Hindsight had quietly rewritten the agent’s scoring logic based on one bad feedback loop.
job sense ai
“Why did it reject a perfect resume?” I dug into the logs and realized Hindsight had quietly rewritten the agent’s scoring logic based on one bad feedback loop.
What I actually built
This project is a job matching agent that reads resumes, scores candidates, and ranks them against job descriptions. Nothing fancy on the surface: parse resume → extract features → score → return top candidates.
The interesting part is that the scoring logic isn’t fixed.
I wired it up with Hindsight GitHub repository so the agent could learn from feedback—things like:
“This candidate should have been ranked higher”
“This profile is irrelevant despite keyword match”
Instead of retraining a model, I let the agent adapt its behavior by replaying past decisions and corrections.
How the system is structured
At a high level, the code splits into three parts:
Resume ingestion + parsing
Scoring pipeline
Hindsight-backed memory + feedback loop
The scoring pipeline looks roughly like this:
def score_candidate(resume, job_description):
features = extract_features(resume, job_description)
base_score = weighted_score(features)
adjustments = hindsight_adjustments(resume, job_description)
return base_score + adjustments
The key is that hindsight_adjustments isn’t static. It’s derived from past feedback stored and replayed through Hindsight.
Feedback events are stored with context:
event = {
"resume_id": resume.id,
"job_id": job.id,
"original_score": score,
"feedback": "should_rank_higher",
"timestamp": now()
}
These get indexed and replayed later when similar candidates show up.
If you’ve read the Hindsight documentation, this is basically using event replay as a lightweight learning layer instead of retraining.
The bug that made this interesting
Everything seemed fine until I noticed something weird:
A strong candidate—clean experience, perfect keyword match—was consistently ranked low.
At first I thought:
parsing bug?
feature extraction issue?
bad weights?
Nope.
It was Hindsight.
What actually happened
One recruiter had marked a similar resume as “not relevant” earlier. That feedback got stored and replayed.
But the similarity match was too broad.
So now:
New candidate comes in
Hindsight finds a “similar” past event
Applies a negative adjustment
Score drops silently
No logs screamed “this is wrong.” It just looked like the system “decided” differently.
Debugging the feedback loop
I had to explicitly trace how Hindsight was influencing decisions.
I added logging like this:
def hindsight_adjustments(resume, job):
events = hindsight.retrieve_similar(resume, job)
for e in events:
print("Replaying event:", e)
return aggregate_adjustments(events)
That’s when it clicked:
The system wasn’t wrong
It was too eager to generalize
The feedback loop had effectively created a soft rule:
“Candidates like this are bad”
…based on a single data point.
Fixing it without killing learning
I didn’t want to remove Hindsight—it was the whole point.
Instead, I constrained it.
- Tightened similarity matching
Instead of loose matching, I added stricter filters:
if similarity_score < 0.85:
continue
This alone reduced a lot of bad carryover.
- Weighted feedback by frequency
One-off feedback shouldn’t dominate:
adjustment = feedback_weight * log(1 + occurrence_count)
Now repeated signals matter more than isolated ones.
- Scoped feedback by job context
A candidate rejected for one role shouldn’t be penalized globally.
So I started indexing feedback like:
key = (job_role, skill_cluster)
instead of just resume similarity.
Before vs after
Before:
One bad feedback → affects many future candidates
Silent score shifts
Hard to debug
After:
Feedback only applies in tight contexts
Multiple signals required to shift behavior
Logs clearly show why scores change
Now when a candidate is penalized, I can point to:
specific past events
similarity threshold
adjustment weight
What Hindsight actually gave me
The biggest shift wasn’t accuracy—it was behavior.
Without Hindsight:
The agent is static
Bugs are code bugs
With Hindsight:
The agent evolves
Bugs become behavioral drift
That’s a very different debugging problem.
If you’re curious, the concept is explained well in this agent memory overview on Vectorize.
A concrete example
A recruiter gives feedback:
“This candidate looks good on paper but lacks real project depth.”
That gets stored.
Later, a similar resume comes in:
same keywords
similar experience level
The system:
retrieves past feedback
applies a small negative adjustment
slightly lowers rank
After multiple similar feedback events, the agent starts implicitly learning:
“Keyword match isn’t enough—depth matters.”
No retraining. Just accumulated corrections.
What I learned
Feedback loops are brittle by default
One bad signal can poison future decisions if you don’t gate it carefully.Similarity is everything
If your retrieval is loose, your learning is noisy. Tightening similarity improved behavior more than anything else.Logging matters more than modeling
I didn’t change the scoring model much. I just made Hindsight visible.Local context beats global memory
Scoping feedback to job role + skill cluster made the system far more stable.“Learning” is just controlled bias accumulation
Hindsight doesn’t magically learn—it just accumulates past decisions. Your job is to control how that bias spreads.
Would I do this again?
Yes—but with guardrails from day one.
Hindsight is powerful, but it will happily amplify your mistakes if you let it.
If you treat it like:
a suggestion system (not ground truth)
a contextual memory (not global truth)
…it becomes a practical way to make agents adapt without retraining.
Otherwise, you’ll end up debugging why your system rejected a perfect resume—and realizing it was your own feedback loop all along.


Top comments (0)