Sanjana S

Posted on Mar 23

Hindsight improved consistency in career advice

#agents #ai #career #showdev

Hindsight improved consistency in career advice

“This skill wasn’t even in the resume.”
That’s what made me stop and check the logs.

The agent had flagged a skill the user never explicitly mentioned in their latest input. At first, it looked like a hallucination. But when I traced it back through the Hindsight logs, it turned out the system had picked it up from a project the user added earlier.

That’s when I realized the system wasn’t guessing — it was remembering.

What I built

I built a simple AI career advisor to help students track their skills, projects, and internship applications. The idea wasn’t new. Plenty of tools already generate resume feedback or suggest roles.

The difference was small but intentional:
I didn’t want each interaction to be isolated.

Instead of treating every input as a fresh start, I wanted the system to build context over time and use that context when giving advice.

The stack is minimal:

Streamlit for the UI
An LLM API for generating responses
A memory layer using Hindsight

The first two parts are standard. The third one is where things changed.

The problem with skills

In the initial version, I relied directly on user input.

user_input = {
    "skills": ["Python", "Machine Learning"],
    "projects": ["Sentiment analysis app"]
}

Based on this, the agent would generate recommendations like:

“Apply for machine learning internships”
“Strengthen your deep learning knowledge”

It looked fine. But it wasn’t reliable.

The issue showed up quickly when I tried edge cases. If a user added a skill without meaningful experience, the system would fully trust it. A single keyword could completely change the direction of the advice.

Someone could write “Embedded Systems” after blinking an LED once, and suddenly the agent would start recommending firmware roles.

That’s when the flaw became obvious:

Skills shouldn’t come from what users say — they should come from what they actually do.

Logging actions with Hindsight

To move away from static input, I started logging user actions as events using Hindsight.

For example:

hindsight.log_event(
    user_id=user_id,
    event_type="project_added",
    payload={
        "title": "ESP32 LED Blink",
        "tech": ["ESP32", "C"]
    }
)

And for applications:

hindsight.log_event(
    user_id=user_id,
    event_type="internship_applied",
    payload={
        "role": "Frontend Intern",
        "result": "rejected"
    }
)

Instead of a snapshot, I now had a timeline.

If you look at the Hindsight documentation, this approach is closer to event-based memory than traditional chat history. You store structured actions and reconstruct context when needed.

Deriving skills instead of trusting them

Once events were in place, I stopped using the skills field directly. Instead, I derived skills from user activity.

events = hindsight.get_events(user_id)

skills = set()
for event in events:
    if event.type == "project_added":
        skills.update(event.payload.get("tech", []))

This change was small in code but significant in behavior.

Now:

Skills are based on projects and actions
Repeated usage reinforces certain skills
Random additions don’t immediately affect output

It also made the system stricter. Users couldn’t just add a keyword and expect instant changes.

Using memory in prompts

The next challenge was deciding how much memory to include.

Passing all events into the prompt didn’t work. It introduced noise and made responses inconsistent.

So I limited retrieval:

relevant_events = hindsight.query(
    user_id=user_id,
    limit=10
)

context = format_events(relevant_events)

Then:

response = llm(context + user_query)

This approach keeps things manageable:

Only recent or relevant events are included
The prompt stays within limits
The model gets enough context to adjust responses

For a broader view of how this fits into agent systems, the Vectorize agent memory page gives a good overview.

What actually changed

Before adding memory:

The system reacted only to current input
Skills were static and user-defined
Same input produced the same output

After adding Hindsight:

The system considers past behavior
Skills evolve over time
Same input produces different output depending on history

Example:

Before:
“Apply for machine learning internships.”

After:
“You’ve listed ML, but your projects don’t reflect it yet.”

The difference is subtle, but important. The second response is grounded in history, not just input.

Where it got interesting

The most interesting behavior showed up when inputs conflicted.

For example:

A user builds backend-heavy projects
Applies to frontend roles and gets rejected
Adds “React” as a skill

A stateless system would immediately switch to frontend recommendations.

With Hindsight, the response became more cautious:

“You’ve recently added React, but your past work is still backend-focused. Consider building a frontend project before applying again.”

This wasn’t hardcoded. It emerged from combining:

Event history
Derived skills
Prompt logic

A subtle bug memory exposed

Before memory, resume feedback was simple:

def generate_resume_feedback(resume_text):
    return llm(resume_text)

After integrating Hindsight:

def generate_resume_feedback(resume_text, user_id):
    events = hindsight.get_events(user_id)
    context = summarize(events)
    return llm(context + resume_text)

The same resume started getting slightly different feedback over time.

At first, I thought it was randomness. But it turned out the system was incorporating past outcomes:

Rejections → more critical feedback
Improved projects → more positive suggestions

This made responses more contextual, but also harder to debug.

Tradeoffs

Adding memory improved consistency, but introduced friction.

Debugging got harder
You can’t reproduce behavior with just input anymore.

Data structure matters
If events aren’t consistent, everything breaks.

Prompt balancing is tricky
Too much context adds noise. Too little removes value.

Behavior needs explanation
If responses change, users need to know why.

What I learned

Don’t rely on user-declared skills
Store structured events early
Derive state instead of trusting input
Retrieve selectively, not everything
Make reasoning visible in responses

What I’d change next

If I extend this further, I’d focus on:

Adding recency weighting
Tracking richer outcomes (interviews, shortlists)
Introducing a skill confidence score

Right now, skills are binary. In reality, they should evolve gradually.

Closing

The biggest improvement didn’t come from better prompts or a better model.

It came from changing how the system thinks about memory.

Once it stopped reacting to single inputs and started using patterns over time, the advice became more consistent.

Not perfect — but grounded enough to be useful.

And that turned out to matter more than anything else.

DEV Community

Hindsight improved consistency in career advice

Hindsight improved consistency in career advice

What I built

The problem with skills

Logging actions with Hindsight

Deriving skills instead of trusting them

Using memory in prompts

What actually changed

Where it got interesting

A subtle bug memory exposed

Tradeoffs

What I learned

What I’d change next

Closing

Top comments (0)