What if your AI coding tutor actually knew you?

#ai #opensource

I Built an AI That Remembers How You Code — Here's What Happened

The screen split open and I just stared at it.

Left side — Kōdo mid-conversation, knowing exactly what the user had struggled with last week.
Right side — a Monaco editor sliding in, pre-loaded with a problem Kōdo had picked specifically
for this user's weak areas. No refresh. No navigation. Just — split. Smooth. Alive.

That was the moment I knew we had built something real.

What We Built

Kōdo is an AI coding mentor with persistent memory. Not "memory" in the sense of
"we stored your chat history in a database." Memory in the sense of — Kōdo watches how you
think across sessions, builds a behavioral profile of you, and becomes a fundamentally
different tutor the longer you use it.

Every other AI tutor resets when you close the tab. Kōdo doesn't.

The core features:

Comeback Brief — Every session opens with a personalized summary of where you left off. Kōdo tells you what you were working on, what was left unfinished, and what needs attention today. You didn't set a reminder. Kōdo just knows.
Pattern Interrupts — When you ask for help, Kōdo checks your history first. If you've hit the same wall four times, it calls that out before explaining anything. "This is the third time you've struggled at the implementation phase on a problem you understood conceptually. Want to talk about that first?"
Adaptive Explanations — Kōdo remembers which analogies worked for you. If the stack analogy clicked for recursion last Tuesday, it uses that framing again when you hit DP.
Built-in Editor — Monaco Editor (same engine as VS Code) embedded directly in the app. Syntax highlighting, autocomplete, error squiggles — and code execution via Piston API. No context switching.
Problem Recommendation — Kōdo suggests problems based on your weak areas and tracks what you've already solved so you never get the same problem twice.
Weekly Insight Card — Not a list of problems solved. A behavioral analysis. What patterns Kōdo noticed. What you keep avoiding. What's about to be forgotten.

The Stack

FastAPI backend, Next.js frontend, Monaco Editor for the code environment, Piston API for
execution, Groq for the LLM, and Hindsight for
the memory layer that makes all of it possible.

Hindsight is a memory system built specifically for AI
agents. It doesn't store raw conversation text — it extracts structured facts, builds a
knowledge graph, and synthesizes observations over time. When Kōdo says "you've been avoiding
graph problems for three weeks" — that insight came from Hindsight's observation consolidation,
not from us writing any pattern detection logic.

How We Used Hindsight

Three API calls. Each one doing something distinct.

retain() — Called after every chat message and problem attempt. We don't store the raw
conversation. We store behavioral inferences — what the interaction reveals about how this
person thinks:

memory.retain(
    userid=user_id,
    content=f"User asked: {message}\nKōdo responded: {response}"
)

The richer the content string, the better the memory. Hindsight's LLM extracts facts from
whatever you feed it. We learned early that "user failed p003" is useless. "User attempted
Climbing Stairs (DP, easy), failed after 2 attempts, code shows they understood the recurrence
but implemented the wrong base case" — that's a memory worth having.

recall() — Called before every response to fetch relevant context. When a user is
solving a graph problem, we don't recall everything — we recall specifically their history
with graph problems:

recall_query = f"user's history with {problem['tags']} problems and their common struggles"
existing_memory = memory.recall(user_id, recall_query)

reflect() — This is the powerful one. Unlike recall which fetches raw facts, reflect
runs an agentic reasoning loop over the full memory bank and synthesizes an answer. We use it
for the Comeback Brief, pattern interrupt detection, problem recommendations, and the Weekly
Insight Card:

weekly_summary = memory.reflect(
    user_id,
    "Analyze this user's learning patterns from the past week...",
    budget="high"
)

The budget parameter controls how deep Hindsight searches — low for speed, high for
complex synthesis. Weekly insights use high. Quick session openers use mid.

You can learn more about how Hindsight handles
agent memory here.

The Hard Parts

Naming a file hindsight.py — Our first import error was a circular import because we
named our memory module the same as the package. Python imported our file instead of the
library. Renamed it memory.py. Five minutes wasted, lesson learned forever.

New users with no memory bank — Hindsight throws a 404 when you try to recall from a
user that doesn't exist yet. Every memory.recall() and memory.reflect() call needed a
try/except or new users crashed the entire endpoint on first message.

LLMs don't always return clean JSON — We use Groq for intent detection — figuring out
if the user wants a coding problem and what topic. Sometimes Groq wraps the JSON in markdown
fences. We strip them before parsing:

intent_data = json.loads(
    intent.strip().strip("```

json").strip("

```").strip()
)

Not elegant. But it works.

Topic matching — Hindsight's reflect might return "dynamic programming" when your tag is
"dp". We fixed this by explicitly listing valid tags in the reflect prompt so the output
always matches something in our problem bank.

What I'd Do Differently

Build the memory schema first. Before writing a single endpoint, decide exactly what you're
going to retain and why. We retrofitted this and it showed — early sessions stored vague
strings that Hindsight couldn't extract much from. The quality of your memory is only as good
as what you put in.

Don't underestimate reflect(). Most teams at this hackathon probably used retain and recall
and called it done. Reflect is where the magic actually lives — it's not just retrieval, it's
reasoning. The Comeback Brief, the pattern interrupts, the weekly insights — none of that
exists without reflect doing real work.

Separate your frontend and backend folders from day one. We didn't. Deployment was messier
than it needed to be.

The Moment

The screen split open and I just stared at it.

Kōdo had remembered. It had picked the right problem. It knew this user's history without
being told. And the editor just — appeared. Right there. Ready.

We built this in 36 hours. With a memory system we'd never used before, a hackathon deadline
breathing down our necks, and a lot of trust that the pieces would come together.

They did.

Kōdo is open source — check out the repo.