Hindsight Made My Study Agent Learn

Aditya Goyal — Sun, 22 Mar 2026 13:11:35 +0000

“Why are you giving me recursion problems again?” I checked the logs and realized the agent wasn’t repeating randomly — it had remembered my past mistakes and was deliberately making me practice my weak topics.

That was the moment this project stopped being a chatbot and started behaving more like a tutor.

I originally set out to build a simple AI chatbot that could help with studying and coding practice. The idea was straightforward: chat with an AI, generate quizzes, solve coding problems, and track progress. But very quickly I ran into a problem — the AI forgot everything between sessions. It didn’t remember what I struggled with, what mistakes I made, or what I was trying to improve.

That’s when I realized the interesting problem wasn’t building another chatbot. The interesting problem was building an agent that could learn from a user over time.

What I Built

The system I ended up building is a multi-user AI study assistant that includes:

AI chat mentor
Coding practice and mistake tracking
Progress tracking
Study plan generation
A structured memory system
A dashboard showing weak topics and progress
Edupilot link

Technically, the system is built with a Next.js frontend hosted on Vercel, Supabase for authentication and database storage, and a language model accessed through the Groq API. The interesting part is the memory layer, where I used Hindsight to extract structured memory from interactions and feed it back into the model later.

Instead of treating each message independently, the system stores structured learning signals like weak topics, mistakes, goals, and progress, and uses them to influence future responses.

The Problem With Stateless Chatbots

Most AI assistants today are stateless. They respond very well within a single conversation, but they don’t build long-term understanding of a user. They don’t know what you struggled with last week, which topics you are weak in, or whether you are improving over time.

For a learning system, this is a big limitation. A real teacher remembers your mistakes, tracks your progress, and adjusts what they teach you. I wanted to see what would happen if an AI assistant could do something similar.

At first, I tried just storing chat history and sending it back as context, but that quickly became messy and inefficient. Raw chat logs are not really memory — they are just transcripts. What I actually needed was structured memory that the system could reason about.

This is where Hindsight became useful. Instead of only reacting in the moment, the agent periodically looks back at interactions and extracts structured information that can be used later.

If you’re interested in the concept, here are useful resources:

Hindsight GitHub repository

Hindsight documentation

Agent memory page on Vectorize

These explain why structured memory is important for agent systems.

Memory Extraction Instead of Raw Chat Logs

After each chat or coding session, the system runs a second step that extracts structured memory from the interaction. Instead of storing the entire conversation, it stores things like weak topics, mistakes, goals, and progress updates.

Conceptually, it looks like this:

const extraction = await extractTurnMemory({
  userMessage,
  assistantReply,
});

await insertMemoryExtraction({
  userId,
  weakTopics: extraction.weakTopics,
  mistakes: extraction.mistakes,
  goals: extraction.goals,
  progress: extraction.progress
});

This structured memory is stored in tables like:

memory_extractions
progress_events
coding_mistakes
chat_history

This makes it much easier to query and use memory later compared to searching through long chat logs.

Retrieval Before Every Response

Storing memory is only half the system. The other half is retrieval.

Before generating a response, the system retrieves relevant past memory and includes it in the model prompt. The flow looks roughly like this:

const extractions = await retrieveRelevantExtractions(userId, userMessage);
const mistakes = await getRecentCodingMistakes(userId);

const memoryContext = formatMemoryForPrompt({
  extractions,
  mistakes
});

const response = await generateLLMResponse({
  message: userMessage,
  memory: memoryContext
});

Because of this, the model can respond with awareness of the user’s history. For example, it might say something like:

You struggled with recursion base cases earlier. Let’s review that before trying this problem.

That behavior is not hardcoded — it emerges from feeding structured memory back into the model.

Coding Practice and Mistake Tracking

One part of the system that turned out to be very useful was coding mistake tracking. When a user submits a solution, the system records the problem, topic, mistake type, and description.

Conceptually:

await insertCodingMistake({
  userId,
  problemId,
  mistakeType: "off_by_one",
  description: "Loop boundary error in array traversal"
});

Over time, the system builds a profile of common mistakes and weak topics, which are then used to generate study plans and recommendations.

This turned the system from a chatbot into something closer to a learning tracker.

System Architecture

The architecture is built around a memory loop rather than a simple request-response system.

The flow looks like this:

User sends a message or submits code.
API retrieves relevant past memory from the database.
Message + memory are sent to the LLM.
LLM generates a response.
Memory extraction step runs using Hindsight.
Structured memory is stored in Supabase.
Dashboard and study plan are updated.
Future interactions use stored memory.

This loop allows the agent to gradually learn about the user over time instead of treating every interaction as stateless.

Structured Memory Storage

Instead of storing raw chat logs, the system stores structured learning
signals in the database. These include weak topics, coding mistakes,
progress events, and goals. The main tables include memory_extractions,
progress_events, coding_mistakes, and chat_history.

This structure makes it easier to retrieve relevant memory later and
use it to influence the model’s responses. Instead of searching through
long chat histories, the system can directly retrieve things like weak
topics or recent mistakes and include them in the prompt.

Challenges I Ran Into

One of the biggest challenges was deployment. Initially, I used SQLite and local file storage, which worked perfectly in development but failed in the serverless environment on Vercel because the filesystem is read-only. I had to migrate everything to Supabase PostgreSQL and rewrite several parts of the backend to remove local file usage.

Another challenge was extracting structured memory reliably from LLM responses. Getting consistent JSON output and validating it properly took more time than expected. I also had to design the memory schema carefully so that it was useful for retrieval and not just a dump of data.

Authentication and multi-user data separation were also tricky at first. Every table had to include a user_id so that memory and progress were stored separately for each user.

Lessons Learned

A few things I learned from building this project:

Memory should be structured, not raw chat logs.
Retrieval is more important than storage.
Tracking user mistakes is very useful for personalization.
Serverless environments change how you design backend systems.
Agents become much more interesting when they learn over time.
A memory loop is more important than a bigger context window.

Closing Thoughts

This project started as a simple chatbot for studying, but the interesting part ended up being the memory system and hindsight loop.

The main takeaway for me is this: the most useful agents are not the ones that respond best once,

but the ones that learn and improve over time.

Adding memory — especially structured memory extracted using hindsight — turns a stateless chatbot into a system that evolves with the user. And once you see that behavior in practice, it’s very hard to go back to building stateless chatbots.

DEV Community: Aditya Goyal