DEV Community

Bonamukkala Charan
Bonamukkala Charan

Posted on

The Coding Mentor That Knows Your Weaknesses Better Than You Do

The Coding Mentor That Knows Your Weaknesses Better Than You Do

The moment that changed this project for me: a student submitted bad code, got feedback, then submitted the same bad code again two minutes later. Without memory, the agent said the exact same thing. With Hindsight, it said — "We've been here before."

That single moment made me realize every coding platform I'd ever used was broken in the same way.

The Problem Nobody Talks About

LeetCode, HackerRank, CodeChef — I've used them all. They're great for practice. But they all share one fatal flaw: they forget you the moment you close the tab.

You can make the same mistake 50 times and they'll give you the exact same generic feedback every single time. No pattern recognition. No memory. No personalization. The platform has no idea you've been struggling with edge cases for three weeks straight.

This isn't a small problem. It's the core reason most self-taught developers plateau. They're getting feedback from a system that has no idea who they are.

So I built something different.

What I Built

AI Coding Mentor is a full-stack application that uses Hindsight — a persistent memory system by Vectorize — to remember every mistake you make, every pattern you repeat, and every concept you struggle with across every single session.

The stack is straightforward:

  • React frontend with 19 feature tabs
  • FastAPI Python backend with 15+ endpoints
  • Groq API (Qwen3-32B) as the LLM
  • Hindsight Cloud as the memory layer

The frontend talks to FastAPI. FastAPI talks to Hindsight to store and retrieve memories, then passes context to Groq to generate personalized responses. Simple architecture, but the memory layer changes everything.

The Three API Calls That Change Everything

Hindsight gives you three operations. Understanding how I used all three is the core of this article.

retain() — Store What Happened

Every time a student submits code, I store what happened:

def retain_memory(user_id, content):
    bank_id = f"coding-mentor-{user_id}"
    url = f"{HINDSIGHT_BASE_URL}/v1/default/banks/{bank_id}/memories"
    res = requests.post(url, headers=HEADERS, json={"content": content})
    return res
Enter fullscreen mode Exit fullscreen mode

And I call it after every submission with rich context:

retain_memory(
    submission.user_id,
    f"Problem: {submission.problem} | Language: {submission.language} | 
    Score: {score} | Weaknesses: {weaknesses} | Strengths: {strengths}"
)
Enter fullscreen mode Exit fullscreen mode

This is the foundation. Every submission becomes a memory node. The agent is always watching, always storing.

recall() — Pull What's Relevant

Before generating any feedback, I pull the student's relevant history:

def recall_memory(user_id, query):
    bank_id = f"coding-mentor-{user_id}"
    url = f"{HINDSIGHT_BASE_URL}/v1/default/banks/{bank_id}/memories/recall"
    res = requests.post(url, headers=HEADERS, json={"query": query, "limit": 5})
    if res.status_code == 200:
        memories = res.json().get("memories", [])
        return "\n".join([m.get("content", "") for m in memories])
    return ""
Enter fullscreen mode Exit fullscreen mode

Then I pass that memory context directly into the prompt:

past = recall_memory(user_id, f"mistakes and patterns in {language}")
memory_context = f"Student past history:\n{past}" if past else "No history yet."

prompt = f"""You are an expert coding mentor.
{memory_context}

Current problem: {submission.problem}
Student code: {submission.code}

Analyze and give personalized feedback..."""
Enter fullscreen mode Exit fullscreen mode

This is where the magic happens. The LLM isn't just looking at the current code — it's looking at the current code in the context of everything this student has ever done wrong.

reflect() — Synthesize Deep Patterns

recall() finds relevant memories. reflect() thinks across ALL of them to find patterns:

def reflect_memory(user_id, query):
    bank_id = f"coding-mentor-{user_id}"
    url = f"{HINDSIGHT_BASE_URL}/v1/default/banks/{bank_id}/reflect"
    res = requests.post(url, headers=HEADERS, json={"query": query})
    if res.status_code == 200:
        return res.json().get("reflection", "")
    return ""
Enter fullscreen mode Exit fullscreen mode

I use this for the deeper features — the student's learning profile, their 30-day roadmap, their concept gap analysis. These need synthesis across the entire history, not just the five most relevant memories.

The Before/After That Proves It Works

The Memory Replay feature shows the difference most clearly. Same code. Same problem. Two completely different responses.

Without Hindsight memory:

"Your loop logic has an error. You should check the boundary conditions more carefully. Consider what happens when the input is empty."

With Hindsight memory (after 5 submissions):

"This is the third time I've seen you miss the empty input edge case. Last week you had the same issue in your binary search implementation. This isn't a syntax problem — you have a consistent blind spot around edge cases. Before you write any more code today, I want you to stop and write out every possible edge case first."

Same code. Completely different response. The second one is actually useful.

This is what I built the Memory Replay tab for — judges, users, and anyone skeptical can submit code twice and see the difference live, side by side.

The Mistake DNA Feature

One of the features I'm most proud of is Mistake DNA — a visual fingerprint of your unique error patterns.

After enough submissions, I use reflect() to analyze all stored memories and categorize mistakes into types:

prompt = f"""Based on this student history: {history}

Analyze their mistake patterns and respond in JSON:
- logic_errors: number (0-10)
- syntax_errors: number (0-10)  
- edge_case_blindness: number (0-10)
- naming_conventions: number (0-10)
- algorithm_choice: number (0-10)
- error_handling: number (0-10)
- top_mistake_type: string
- dna_summary: string"""
Enter fullscreen mode Exit fullscreen mode

The result is a bar chart that shows your unique coding fingerprint. No two students have the same DNA. LeetCode gives everyone the same hint. This gives you a mirror.

What Surprised Me

Hindsight's reflect() is the underrated one. I expected retain() and recall() to be the workhorses. They are — but reflect() is what makes the experience feel genuinely intelligent. When a student asks "what should I study next?" and the answer is built from synthesis across 20 past submissions, it feels less like a chatbot and more like a mentor who has been paying attention.

Per-user memory banks were the right call. I create a separate Hindsight bank for each student ID: coding-mentor-{user_id}. This keeps memories isolated, makes retrieval faster, and means one student's patterns don't bleed into another's feedback. Simple decision, big impact.

The first submission is always generic. This is actually fine — it sets expectations correctly. By the third submission, the difference is obvious. By the tenth, it's remarkable. The value compounds.

What I'd Do Differently

The biggest limitation right now is that I'm storing memories as plain text strings. A richer schema — structured JSON with explicit fields for mistake type, severity, problem category — would make recall() queries dramatically more precise. Right now I'm relying on semantic similarity to find relevant memories. Structured metadata would let me filter more precisely.

I'd also trigger reflect() automatically every 5 submissions instead of only on explicit feature calls. The more often the agent synthesizes, the better the pattern recognition gets.

The Lessons

  • Memory changes the product category. Without Hindsight, this is a code review tool. With Hindsight, it's a mentor. Same LLM, same prompts — memory is the difference.
  • retain() everything, even when it feels redundant. More context is always better. Storage is cheap. Personalization is valuable.
  • Show the before/after explicitly. Users don't notice memory working until you show them what it looks like without it. The Memory Replay feature taught me this.
  • Per-user banks scale cleanly. Don't try to store everyone in one bank and filter by metadata. Separate banks keep things fast and isolated.

Try It Yourself

The full project is open source. Clone it, add your Hindsight and Groq API keys, and run it locally in under 10 minutes.

GitHub: https://github.com/bonamukkala-bot/coding-mentor

If you want to add memory to your own agent, start with the Hindsight documentation and the Hindsight GitHub repository. The agent memory page on Vectorize is also worth reading before you design your memory schema.

The coding mentor that knows your weaknesses better than you do isn't magic. It's three API calls and a system that never forgets.

Top comments (0)