I used to think I was bad at recursion. Turns out, I wasn't bad at recursion — I just kept forgetting the same base case error every single time I opened a new tab on LeetCode.
That realization hit me hard one Sunday afternoon when I failed the exact same Fibonacci problem I had failed three weeks earlier on a different platform. Same error. Same confusion. Same wasted hour. The platform had no idea. It just cheerfully served me another random problem like nothing had happened.
That's when I decided to build something different.
The Problem With Every Coding Platform You've Used
LeetCode, HackerRank, Codecademy — I have used them all. They are brilliant for volume. But they all share one fatal flaw: they have the memory of a goldfish.
Every session starts from zero. The platform has no idea you struggled with off-by-one errors in binary search last Tuesday. It doesn't know you have a habit of forgetting empty list handling in Python. It doesn't connect your past failures to your future challenges. You are essentially paying to repeat your own mistakes indefinitely.
I call this Context Amnesia. And it is the single biggest reason developers hit learning plateaus that last months.
The fix wasn't complicated in theory. What if the AI actually remembered?
What I Built
CodeMentor AI is a full-stack coding practice platform that uses Hindsight — an open source agent memory system by Vectorize — to remember every mistake a user makes across every session, forever.
When you submit wrong code, the mistake is saved to Hindsight memory. When you log back in two days later, the AI recalls those exact mistakes and generates a problem specifically targeting your weak spots. After 5 sessions, it reflects on patterns. After 10 sessions, it builds a complete mental model of how you think as an engineer.
The result is an AI mentor that genuinely gets smarter the more you use it — not just within a session, but across weeks and months.
Live demo: ai-coding-mentor-eight.vercel.app
GitHub: github.com/amankr2776/ai-coding-mentor
How Hindsight Memory Powers the Entire App
This is the part I am most proud of. Hindsight's agent memory system isn't bolted on as a feature — it is the architectural backbone of CodeMentor AI. Every intelligent behavior in the app flows from these four functions.
retain() — Every Mistake Becomes an Asset
The moment a user submits wrong code, we don't just show an error. We archive it:
typescriptawait hindsight.retain(
User practiced ${topic} in ${language}. ,
Result: failure.
Error: ${rootCause}
{ type: 'failure', topic, language, difficulty, title }
);
We also call retain when a user asks for a hint (meaning they got stuck), when they fail a quiz question, and when they take longer than 15 minutes on a problem. Every struggle becomes a data point in their personal learning matrix.
recall() — Personalized Problems, Not Random Ones
Before generating any new problem, the AI mentor recalls your history:
typescriptconst memories = await hindsight.getHistory();
const weaknessSummary = memories











.slice(0, 20)
.map(m => m.content)
.join("\n");
// This context gets injected into the Groq Llama-3 prompt
If you failed Dynamic Programming problems three times this week, the system clusters those failures and generates a challenge that directly targets that gap. Not a random problem. Your problem.
reflect() — Finding Patterns You Cannot See Yourself
Every 5 submissions, the app triggers a reflection:
typescriptawait client.reflect('abhimanu', 'summarize user learning patterns');
Hindsight's reflect function analyzes the raw memories and synthesizes higher-level observations like: "User is proficient in Python syntax but consistently fails at space complexity optimization in matrix problems."
These observations show up on the Neural Insights dashboard and power the smart notifications — things like "You haven't practiced recursion in 3 days" or "You are close to mastering arrays."
createMentalModel() — A Technical Profile That Evolves
Every 10 interactions, the app builds a formal mental model:
typescriptawait client.createMentalModel(
'abhimanu',
'User Learning Profile',
'What are the key learning patterns and weak areas of this user?'
);
This becomes the "system prompt override" for the AI mentor — ensuring it talks to you like a coach who knows your entire history, not a chatbot that just woke up.
The Part That Surprised Me
I want to be honest: I did not expect Hindsight to store as much rich, structured information as it did.
When I first tested the app and checked the Hindsight Cloud dashboard, I was genuinely shocked. The World Facts tab had 61 memories. Experience had 77. Observations had 44 synthesized patterns. All from just a few hours of testing.
The memory graph looked like a neural network — nodes connected by semantic, temporal, entity, and causal links. Hindsight wasn't just storing strings. It was building a knowledge graph of the user's learning behavior automatically, without me writing any extra code.
That was the moment I understood what agent memory actually means. It is not a database. It is a living model of a person's knowledge state.
Building the Practice Page: The Hardest Part
I handled the frontend, styling, and memory integration. The practice page was by far the hardest thing I built.
The challenge was making the code evaluation feel intelligent rather than mechanical. Early versions used exact string matching — if your output didn't match the expected output character for character, you got marked wrong. That was terrible. A trailing newline would fail a perfectly correct solution.
The final approach uses Groq to evaluate semantically:
typescriptconst prompt = Evaluate this ${language} code for the problem.;
Check line by line for syntax errors, logic errors, wrong output.
Respond in JSON: {"passed": true/false, "feedback": "...",
"correctCode": "...", "mistakes": [...]}
When code is wrong, it shows a side-by-side comparison with the correct solution, highlights the exact lines that differ, and saves the specific mistake to Hindsight so next time the AI knows exactly where you struggled.
What I Would Do Differently
The biggest lesson: call retain with richer content.
Early on I was saving minimal strings like "User failed problem." That's useless for recall. The memories that actually powered good personalization were the detailed ones — the ones that explained the topic, the language, the specific error type, and the root cause analysis from Groq.
The quality of recall is entirely dependent on the quality of what you retain. Garbage in, garbage out — but with Hindsight, good data in means genuinely intelligent responses out.
The Stack
Frontend: Next.js 15 with Tailwind CSS
Memory: Hindsight by Vectorize — the open source agent memory system
LLM: Groq API with Llama-3.3-70b-versatile
Code Execution: Piston API (free, no key needed)
Deployment: Vercel
Try It Yourself
The live demo is at ai-coding-mentor-eight.vercel.app. Submit a few wrong solutions and then check how the next problem changes. The memory effect becomes visible within 3-4 sessions.
If you want to build something similar, start with the Hindsight documentation and the open source repo. The SDK is clean and the retain/recall pattern is surprisingly simple to implement. The hard part is deciding what to remember — which turns out to be a product design question more than a technical one.
The code is open source at github.com/amankr2776/ai-coding-mentor. If you build something with Hindsight, I'd genuinely love to see what patterns you discover in your users' memory banks.
Aman Kumar — built the frontend, styling, and Hindsight memory integration for CodeMentor AI.
Team HACKONAUT
Top comments (0)