DEV Community

Cover image for How Hindsight Gave My Agent a Real Memory
Karan Patil
Karan Patil

Posted on

How Hindsight Gave My Agent a Real Memory

"It actually remembered." My teammate stared at the screen as CodeMentor AI greeted her with — "Last time you struggled with hash maps. Want to try a problem that builds on that?" We hadn't written a single line of personalization code.

That was somewhere around 1:30 AM. I was in Pune, running on chai and stubbornness, and this was probably my third hackathon. I had been building for about 14 hours straight using AI tools to move fast — and honestly, even I didn't fully expect it to work this well.

Try it live: codemento.netlify.app


How This Started

I'll be straight with you — I didn't know what Hindsight was when the hackathon started. The problem statement said "build AI agents that learn using Hindsight" and I opened the docs, got confused, and ended up just asking Claude to explain it to me like I was five.

Turns out it's actually simple. Hindsight is a memory layer for AI agents. Three operations: store a memory, fetch relevant memories, reason over them. That's it. Once I understood that, the whole project clicked.

The idea was this: every AI coding tool forgets you the moment you close the tab. I wanted to build one that didn't. A coding mentor that remembers your worst habits, your patterns, your progress — and actually uses that to help you get better.

I called it CodeMentor AI.


What I Actually Built

The app looks like VS Code had a baby with an AI tutor. Monaco Editor in the center, problem list on the left, AI mentor chat on the right. 75 LeetCode-style problems, live test case evaluation, a skill radar chart showing your strengths, and a 3D animated background because why not.


Main interface — VS Code-style editor with problem list, complexity analyzer, and AI mentor sidebar

The whole thing runs in the browser — React + Vite + TypeScript. Groq's llama-3.3-70b for the AI mentor because it's fast and free. And Hindsight for the part that actually makes it interesting.

I built most of it using AI coding agents. Not because I'm lazy — because the hackathon had a submission deadline and I was one person trying to ship something real. The Hindsight integration though, I wrote carefully myself. That was the whole point of the project.


The Skill Radar — What the Agent Actually Knows About You

One of the first things you'll notice is the Skills tab. It's a radar chart showing your strengths across Arrays, Strings, DP, Stacks, Math, and Sorting — all derived from your Hindsight memory.

Skill Radar Chart
The skill radar — built entirely from Hindsight recalled memories, not hardcoded scores

Every score on that chart comes from recall(). The agent fetches your past attempts, parses successes and failures per topic tag, and builds the visualization from real data. No hardcoded scores. No fake progress bars. It's what Hindsight actually knows about you.


The Memory Architecture — Three Functions

The entire memory system is three API calls. Here's exactly how I use each one.

retain() — After Every Code Submission

Every time someone runs their code, I call retain() with a structured summary:

await retain(hindsightKey, bankId,
  `${username} attempted "${problem.title}" (${problem.difficulty}) in ${lang}.
   Result: ${pass ? 'SOLVED' : 'FAILED'}. Attempt #${problem.attempts}.
   ${!pass
     ? `Error: ${result.error}. Failed ${failedCount}/${total} test cases.
        Code snippet: ${code.substring(0, 400)}`
     : 'Solved successfully.'
   }`,
  {
    type: pass ? 'success' : 'failure',
    problem: problem.title,
    difficulty: problem.difficulty,
    language: lang,
    timestamp: new Date().toISOString()
  }
);
Enter fullscreen mode Exit fullscreen mode

Important thing I learned: don't store the raw code. Store the meaning. "Karan failed Two Sum attempt 3, used brute force O(n²), missed the hash map optimization" — that's a useful memory. 400 lines of wrong JavaScript is noise.

I also call retain() when someone clicks the hint button. That tells the system they got stuck, and on what. Small signal, but it adds up over sessions.

recall() — Before Showing Any Guidance

When a user selects a problem, I immediately fetch relevant memories:

const memories = await recall(
  hindsightKey,
  bankId,
  `${username} past attempts and errors on ${problem.title}`,
  5
);
Enter fullscreen mode Exit fullscreen mode

And when the app loads, I run a broader query:

const patterns = await recall(
  hindsightKey,
  bankId,
  `${username} overall coding patterns and common mistakes`,
  8
);
Enter fullscreen mode Exit fullscreen mode

These memories go directly into the AI mentor's system prompt. That's how it knows your history — not magic, just accurate retrieval fed as context.

What surprised me here: Hindsight uses something called TEMPR — four search strategies running in parallel (semantic, keyword, graph, temporal). So a query like "Karan's array problems" finds memories you'd miss with pure vector search. It surfaces the related attempts, the pattern across sessions, not just exact matches.

reflect() — Before AI Generates Feedback

Before the mentor responds to a failed submission, I call reflect():

const reflection = await reflect(
  hindsightKey,
  bankId,
  `What are ${username}'s main weaknesses based on their coding history?`
);
Enter fullscreen mode Exit fullscreen mode

This goes into the system prompt alongside the recalled facts. Here's what the actual AI mentor response looks like when someone is stuck:

!

The mentor explaining dynamic programming for Coin Change — specific and actionable, not generic

And when they ask for a hint directly:


Hint response — based on the user's actual code and past history, not a generic answer

The system prompt looks like this:

const systemPrompt = `You are CodeMentor AI, an expert coding tutor for ${username}.

Memory context from Hindsight:
${memories.slice(0, 3).map(m => m.content).join('\n')}

Reflection on this user:
${reflection}

Current problem: ${problem.title} (${problem.difficulty})
Their code: ${code.substring(0, 400)}

Give a targeted 2-3 sentence hint. Don't give the full solution.`;
Enter fullscreen mode Exit fullscreen mode

The response is specific because the system prompt has context. The LLM isn't guessing — it knows what you've tried, what failed, and what concept you're missing.


The Part That Caught Me Off Guard

Around 2 AM, something happened that I genuinely didn't expect.

I hadn't touched the greeting logic. There was no code that said "if user has past memories, say something personalized." But when my teammate opened the app for the second time, the AI greeted her with a message referencing her exact history from the previous session.

It worked because the recalled memories were in the context. The LLM just used them naturally. I hadn't engineered that behavior. It emerged.

That's what agent memory done properly feels like. It's not a feature you add. It's a capability that unlocks when the agent actually knows something about you.

The other thing that caught me off guard: Hindsight auto-consolidates memories into higher-level Observations in the background. After a few sessions of data, the system had synthesized "this user tends to skip edge case handling" without me writing any consolidation logic. The Hindsight docs mention this, but it didn't sink in until I saw it working.


What I'd Do Differently

Query design is everything. The quality of recall() results depends entirely on your query strings. I rewrote mine three times. "User mistakes" is useless. "User failed attempts on array problems involving sorting" is much better. Spend time here.

Test in session 3, not session 1. I kept testing by clearing my memory bank and starting fresh. That's the wrong test. The interesting behavior shows up in the third or fourth session when patterns have accumulated. I almost shipped without ever testing that scenario properly.

Don't store full code. I tried this early on. After two sessions, the context was bloated with irrelevant code snippets and the AI responses got worse. Summarize aggressively. Store decisions and outcomes, not implementations.

Call recall() proactively. Don't wait for the user to ask something. Fetch memories the moment context changes — problem selected, new session started, chat opened. Front-loading context is what makes responses feel intelligent.


Try It

The app is live at codemento.netlify.app.

You'll need a free Groq key from groq.com and a free Hindsight Cloud account from ui.hindsight.vectorize.io — use code MEMHACK315 for $50 in free credits.

If you're building anything with repeat users — a tutor, a support agent, a coding assistant, anything — persistent memory isn't optional. It's the difference between a tool people use once and forget, and one they actually come back to.

I built this in one night, mostly on chai, with a lot of help from Claude when I got stuck. If I can ship this in 14 hours, you can integrate Hindsight into whatever you're building this weekend.


Built at HackWithIndia 2026 — Karan Patil, Pune.

Team: Sarvesh Gajakosh, Priya Vhatkar, Krishna Hasare, Siddhi Shinde, Aashta Sawade.

Resources:

Top comments (0)