I Built an AI Coding Mentor That Watches You Code and Never Forgets — Here's What Happened

"Hindsight flagged you use nested loops when a hash map would be O(n). This challenge builds that reflex."

That message showed up automatically in our app — not because we hardcoded it, not because we wrote special logic for it. It showed up because Hindsight had been watching, retaining, and synthesizing everything that happened in the previous session.

That's the moment I knew we'd actually built something real.

The Problem That Wouldn't Leave Me Alone

Every coding platform I've used has the same silent flaw. You struggle with recursion on Monday. You come back on Wednesday. The platform has no idea. You get another recursion problem, no context, no continuity. It treats you like a stranger every single time.

I'm Vatsal Joshi, and during the Vibe with India Hackathon, my team leader Mansi Sharma and I decided that was worth fixing. We built CodeMentor AI — an AI coding mentor with genuine persistent memory across every session, across every language, across every mistake you've ever made.

What We Actually Built

CodeMentor AI has three core sections that work together:

Mentor Chat — A conversational AI tutor that answers questions, explains concepts step by step, completes your code, and debugs your logic. What makes it different: it already knows your history before you type a single word.

Challenges — A smart challenge engine that doesn't just give you random problems. It looks at your mistake patterns first, then generates something targeted. The challenge page literally shows you why you're getting this problem — "Hindsight flagged you use nested loops when a hash map would be O(n). This challenge builds that reflex."

Progress — A dashboard showing your mistake patterns tracked by Hindsight, your personalized learning path (Arrays → String Manipulation → Hash Maps → Stacks → Recursion → Trees), memories stored via retain(), session time, languages used, and a Live Memory Bank panel that shows Hindsight operations firing in real time.

The whole thing runs on Python, JavaScript, TypeScript, Java, C++, and Go — switching languages mid-session while keeping your memory intact.

The Memory Architecture

Hindsight by Vectorize is doing all the heavy lifting on memory. The three operations we use are retain(), recall(), and reflect() — and understanding the difference between them was the key design insight.

retain() is called every time something meaningful happens — a mistake, a solved problem, a pattern the AI notices. It stores that as a fact in the student's personal memory bank.

def retain_mistake(user_id, language, problem, mistake_description):
    bank_id = f"codemind-user-{user_id}"
    content = (
        f"Mistake in {language}: Problem was '{problem}'. "
        f"The mistake was: {mistake_description}"
    )
    client.retain(bank_id=bank_id, content=content)

recall() is called before every challenge generation and every mentor chat response. It pulls semantically relevant memories from everything stored so far.

history = client.recall(
    bank_id=bank_id,
    query=f"mistakes and weak areas in {language}"
)

reflect() is the one that surprised me most. Instead of returning raw memories, it synthesizes them into a coherent understanding. So instead of dumping 20 mistake records into the AI prompt, we call reflect and get back something like: "Student consistently reaches for O(n²) solutions before considering hash map optimizations." That's what actually gets injected into the mentor's context.

You can see the entire Hindsight documentation — the API surface is smaller than you'd expect for what it does.

The Live Memory Bank — Our Favorite Feature

On the right side of every page, there's a Live Memory Bank panel. It shows a real-time API monitor — every RECALL, RETAIN, and RETAIN Keys operation fires and logs visibly as the student uses the app.

This wasn't just a UI choice. It was a deliberate decision to make memory visible. Most AI tools have memory as a black box. We wanted students to actually see their learning being stored. The API monitor shows entries like:

[23:21:48] RECALL Challenge loaded — targeting: arrays, hash-map, O(n)
[23:21:48] RETAIN Keys loaded from storage — Hindsight live
[23:24:02] ERROR retain: failed to fetch

Yes, we left the errors in. Real systems have errors. Students should see that too. The agent memory system from Vectorize handles the underlying consolidation automatically — we just expose what's happening.

Multi-Language, One Memory

One thing that took thought was language switching. A student might solve Two Sum in Python, then want to try the same concept in JavaScript or Go. We didn't want to create separate memory banks per language — that would fragment the learning story.

Instead, the memory bank is per student. Language is just metadata on each retained fact. When we recall for a C++ challenge, we query "mistakes and patterns in C++" — and Hindsight's semantic search finds the relevant entries even if they were stored with slightly different phrasing.

The result: switching from Python to TypeScript to Go mid-session, the challenge engine still knows you've been struggling with hash map usage. The context survives the language switch.

The Catppuccin Detail

We spent more time on theming than I'd like to admit. The app ships with both Catppuccin Mocha (dark) and Catppuccin Latte (light). This wasn't decoration — coding is a long-session activity. Visual comfort matters. We wanted something that felt like a tool a developer would actually choose to use, not a hackathon prototype.

What Surprised Us

Two things caught us off guard during the build.

First, how quickly Hindsight's observation consolidation starts producing useful signal. After just a few interactions, reflect() was already generating coherent student profiles. We expected this to need many more data points.

Second, the challenge reasoning. We didn't explicitly code "if student has nested loop mistakes, give hash map challenge." We just passed the reflected history into the challenge generator and asked it to target weak areas. The "Why this challenge?" explanation — "Hindsight flagged you use nested loops when a hash map would be O(n)" — comes entirely from the AI reasoning over the memory context. We didn't template that text.

Lessons From the Build

reflect() beats recall() for context injection. Raw memories are noise at scale. Synthesized understanding is signal.

Make memory visible. The Live Memory Bank panel changed how testers engaged with the app. When you can see your learning being stored, the product feels more trustworthy.

Language is metadata, not identity. Don't shard memory by language. Students are the unit of memory, not their current language choice.

Build the memory layer first, test it in isolation, then build UI on top. We spent a day just running retain/recall/reflect in a Python script before touching the frontend. That saved hours.

Try It

The full project is open source:
👉 github.com/MansiSharma11/CodeMentor-AI

If you're building anything that needs to learn from users over time — a tutor, a coach, a support bot — I'd genuinely look at Hindsight. The three-method API is deceptively powerful once the memory bank starts accumulating observations.

Built by Vatsal Joshi and Mansi Sharma for the Vibe with India Hackathon.