Why I used Neo4j and Hindsight instead of a bigger prompt
Every AI tutor I had encountered before this project had the same flaw: it forgot you the moment you closed the tab. Next session, you were a stranger again. You'd struggle with the same concept — recursion, maybe, or Bayes' theorem — and the agent would explain it from scratch, in the same way, with the same examples that didn't land the first time. It had no memory of what you already knew, what confused you, or how you learned best. It was a very expensive flashcard.
That's the problem Smart Tutor was built to fix. And building it taught me something I didn't expect: the intelligence in an adaptive learning agent has less to do with the model you choose and more to do with the data infrastructure you build underneath it.
What Smart Tutor actually does
Smart Tutor is an AI-powered adaptive learning agent. A student logs in, works through questions, and receives explanations and follow-up problems tuned to their comprehension level. So far, that's not unusual. What's different is that the system builds a persistent profile of each student across sessions — tracking which concepts they understand, where they repeatedly trip up, and how their performance changes over time.
The memory layer is powered by Hindsight, an agent memory library built for exactly this kind of longitudinal, per-user storage. Every interaction — right answers, wrong answers, hesitation patterns, concept revisits — gets persisted via Hindsight's retain API and surfaced at inference time when the agent needs to decide what to teach next.
My role on the team was Dev 6: Content + Infrastructure. I owned the question bank (PostgreSQL), the curriculum knowledge graph (Neo4j), the Docker Compose setup, GitHub Actions CI/CD, observability tooling, and the ingestion scripts that let educators update the curriculum without touching code. This is the story of those decisions.
Why a relational question bank isn't enough on its own
The question bank was the easy part. A PostgreSQL table with questions, answer options, difficulty levels, and a concept tag. Standard stuff. The schema is roughly:
CREATE TABLE questions (
id UUID PRIMARY KEY,
concept_id TEXT NOT NULL,
difficulty INT NOT NULL, -- 1 to 5
body TEXT NOT NULL,
options JSONB,
answer TEXT NOT NULL
);
What this table can't tell you is what to do when a student gets a question on recursion wrong. Should you show them another recursion question at lower difficulty? Or do they need to revisit function calls first? Or variable scope? The answer depends on their specific gap — and that gap is structural. It lives in the relationship between concepts, not in any single question row.
That's where the knowledge graph comes in. I modelled the curriculum as a directed graph in Neo4j: concept nodes connected by PREREQ edges. When a student struggles with a concept, the agent doesn't just drop the difficulty — it traverses the graph to find the prerequisite concepts they likely haven't mastered yet.
MATCH (c:Concept {id: $concept})<-[:PREREQ*1..3]-(prereq:Concept)
RETURN prereq.id, prereq.label
ORDER BY length(path) ASC
Pair this with Hindsight's memory layer, which stores the student's performance history per concept, and you have something genuinely useful: the agent can look up which prerequisites the student has already demonstrated competence in, and route them only to the gap that's actually blocking them.
The Hindsight docs describe this pattern well — retain what the student did, recall it the next time a decision needs to be made. In practice, our retain calls fire on every answered question and store concept ID, correctness, difficulty level, and timestamp. Recall queries return the full performance vector for a student across any set of concept IDs.
The infrastructure that makes it run
Running two databases (PostgreSQL + Neo4j) alongside a Node server and the Hindsight integration could have been a pain to set up locally. We kept it simple: a Docker Compose file that brings up the full stack with one command.
services:
postgres:
image: postgres:15
environment:
POSTGRES_DB: smart_tutor
neo4j:
image: neo4j:5
ports: ['7474:7474', '7687:7687']
app:
build: .
depends_on: [postgres, neo4j]
The GitHub Actions pipeline runs on every push to main: lint, test, Docker build, and deploy. Datadog handles metrics (query latency, session length, concept coverage per student), and Sentry catches runtime errors — important when your agent is making decisions based on graph traversals that could hit cycles or missing nodes.
One early issue: Neo4j prerequisite chains can be circular if a curriculum designer isn't careful. We added a cycle detection step in the ingestion script and a CI check that fails the pipeline if any new concept data would introduce a cycle in the graph.
Making curriculum updates engineer-free
One design goal from the start: educators should be able to add new questions and concepts without filing an engineering ticket. The ingestion pipeline takes a simple CSV or YAML input — question body, options, answer, concept tag, difficulty — and handles the rest: inserting rows into PostgreSQL, creating or updating nodes in Neo4j, and validating that all referenced concept IDs exist in the graph.
The script validates prerequisite relationships against the existing graph before writing anything, so a badly structured concept import fails loudly rather than silently corrupting the knowledge graph. In a production system, this matters: a broken prerequisite chain means a student gets routed to content they're not ready for, which is worse than no recommendation at all.
What I actually learned
A few things that would have saved me time if I'd known them up front:
• Graph traversal is fast until your chains get deep. PREREQ queries with *1..3 hops are fine. The moment you increase that to *1..10 without indexing, you feel it. Index your concept IDs early.
• Hindsight removed an entire class of prompt engineering. Before we integrated it, we were trying to encode the student's history in the system prompt — a growing, fragile context blob that hit token limits and was hard to update. Hindsight's retain/recall pattern is just cleaner. The agent asks for what it needs; it doesn't receive everything and hope for the best.
• Observability is not optional when agents make invisible decisions. The agent deciding to reroute a student from recursion to variable scope is a decision you can't see in the UI. Without Datadog logging the graph traversal results and Hindsight recall outputs, debugging a wrong recommendation would be nearly impossible.
• The dual-database setup is worth it. PostgreSQL and Neo4j serve genuinely different purposes here. Trying to do prerequisite traversal in SQL with a self-join adjacency list is painful and doesn't scale. Neo4j makes the graph queries readable and fast. The overhead of running two stores is justified when they each do what they're good at.



The takeaway
The thing that surprised me most about this build was how much of the agent's apparent intelligence came from the infrastructure layer. The model itself is just synthesising two inputs: the knowledge graph says what to teach next, and Hindsight says what this specific student needs. Get both of those right and the agent looks remarkably smart, without any prompt engineering heroics.
If you're building anything that requires agents to learn from users over time, the memory problem is worth solving properly. We found Hindsight's agent memory approach to be the most practical starting point — it gets out of your way and lets you focus on what the agent actually does with the memory.
The repo is at github.com/khushichopra28/smart-tutor. If you have questions about the infra setup or the Neo4j schema, I'm happy to go deeper.
Top comments (0)