My Coding Bot Stopped Repeating Itself After I Added Hindsight Memory

#workplace #webdev #ai #programming

"Did it seriously just do that?" I leaned forward as our coding mentor
recommended the exact problem I kept failing — not because I told it to,
but because it remembered my last four sessions and noticed the pattern
before I did.

What We Built

CodeMentor AI is a coding practice web app with one key difference from
every other platform: it remembers you. Not just your score — your actual
mistake patterns, your weak topics, your solving speed by language, across
every single session.

The memory layer is powered by Hindsight,
a persistent agent memory system by Vectorize. The LLM is Groq running
qwen/qwen3-32b. The frontend is React with Monaco Editor — the same
editor used in VS Code.

The app has 5 modules: a code editor for practice, a mistake memory
tracker, an AI mentor chat, a personalized learning path generator,
and a progress analytics dashboard. Everything is wired through
Hindsight's retain() and recall() functions.

The Problem With Every Other Coding Platform

LeetCode doesn't know you failed binary search three times this week.
HackerRank doesn't know you always mess up recursion base cases.
Every single session starts from zero.

So the "personalized" recommendations are just topic filters. There's
no agent that actually learned from watching you code. You repeat the
same mistakes because nothing is tracking the pattern.

We wanted to fix that.

How Hindsight Memory Changes Everything

Every action in CodeMentor retains a memory to
Hindsight's agent memory system:

// When a student fails a problem
await hindsight.retain({
  type: "mistake_pattern",
  user: "Arun",
  pattern: "off-by-one error",
  language: "Python",
  frequency: 3,
  problems_affected: ["two-sum", "binary-search", "sliding-window"],
  timestamp: new Date().toISOString()
})

Before every AI response, the mentor recalls from memory:

// Recall before answering
const memories = await hindsight.recall(
  "what mistakes does Arun keep making in Python"
)

// Groq receives recalled memories as context
const response = await groq.chat({
  messages: [{
    role: "system",
    content: `You are CodeMentor AI. Here is what you remember 
    about this student: ${memories}
    Use this to give specific, personalized advice.`
  }, {
    role: "user", 
    content: userMessage
  }]
})

The mentor doesn't guess. It knows.

The Before vs After Moment

This is the demo moment that makes judges stop scrolling.

With Memory OFF, the bot says:

"Hello! What would you like to practice today?"

With Memory ON — after recalling from Hindsight:

"Hey Arun — you've hit recursion issues twice this week.
Want to try an easier problem first to build confidence?"

Same LLM. Same prompt. The ONLY difference is the recall() call
pulling real history from Hindsight before the response is generated.

We added a toggle switch in the navbar so you can flip between
the two modes live during a demo. It's the clearest possible way
to show what persistent memory actually does.

What We Stored in Hindsight

We retained four types of memories:

1. Problem attempts — every try, pass or fail, with error type

2. Mistake patterns — recurring issues like off-by-one, null pointer,
missing base case

3. Solved problems — language used, attempts taken, concepts covered

4. Session summaries — daily snapshots of weak and strong areas

We started by only storing solved problems. That gave us almost nothing
useful for personalization. The breakthrough came when we added mistake
patterns — suddenly the agent could say things like "you've had this
exact error 3 times" instead of giving generic advice.

What Surprised Us

We expected Hindsight to be useful for recommendations. We didn't
expect it to make the AI sound genuinely caring.

When the agent says "I noticed you haven't practiced dynamic programming
in 5 days" — it's not hallucinating. It literally recalled that from a
session summary we retained 5 days ago. That grounding makes the
responses feel trustworthy in a way RAG alone never did.

The agent memory features in Vectorize
make this pattern surprisingly easy to implement. retain() and recall()
are the whole API surface. The hard part is deciding what to store.

Lessons Learned

Retain more than you think you need. We started minimal. Adding
mistake patterns and session summaries unlocked 80% of the useful
behaviors.

The recall query is everything. Vague queries return vague memories.
"off-by-one errors in Python arrays this week" returns exactly what
you need. "user mistakes" returns noise.

Show the memory working visibly. We added a Memory Log page that
shows every retain() call ever made. Users trusted the app more when
they could see what it knew about them.

The before/after toggle is your best demo. Nothing explains
persistent memory faster than showing the agent with it OFF vs ON
side by side. Build this into your demo flow.

Don't over-engineer the LLM prompt. The recalled memories do the
heavy lifting. A simple system prompt + recalled context outperformed
our elaborate prompt engineering attempts.