How Hindsight Turned Repeated Questions Into a Student Profile

#agents #ai #rag #showdev

The moment I knew this project was different was when Student A and Student B sent the exact same message — "What should I do this week?" — and our agent returned two responses with zero overlapping events, pulling entirely from two separate Hindsight memory banks that had never been manually configured.

What We Built
The problem sounds deceptively simple: students keep asking the same questions about campus events. The real problem underneath is that every campus system starts from zero every single time. No memory. No personalization. A student who has attended five AI workshops in a row gets the same generic event list as someone who has never left their dorm.
We built a campus assistant that fixes this by remembering. Not through a user profile form, not through explicit preference settings — through behavior. Every event attended, every announcement ignored, every club joined gets stored in Hindsight, a persistent agent memory system built by Vectorize. The agent recalls that memory before every response and uses it to rank, filter, and personalize what it says next.
The stack is straightforward: Groq running qwen/qwen3-32b as the LLM, Hindsight Cloud handling all memory storage and retrieval, FastAPI on the backend, and React on the frontend. The interesting part is entirely in the AI/ directory — specifically how memory flows from raw student actions into personalized recommendations.

The Architecture in Plain Language
AI/
├── agent/
│ └── campus_agent.py ← orchestrates everything
├── data/
│ └── campus_data.py ← events, clubs, deadlines
├── memory/
│ ├── hindsight.py ← store, recall, reflect, store_entry
│ ├── interest_drift.py ← decay calculator
│ ├── relevance.py ← event scorer
│ ├── avoidance.py ← silent avoidance detector
│ └── deadlines.py ← collision checker
└── config.py

campus_agent.py is the only entrypoint the backend touches. It calls run_agent(student_id, message) and gets back a personalized response. Everything else — memory recall, interest extraction, event ranking, deadline collision — happens inside that one function call.

The Core Technical Story: Making Memory Do the Work
My initial instinct was to build a structured user profile. Store interests in a database table. Let students select preferences on signup. Classic approach.
The problem is students lie on preference forms, or they don't know what they want yet, or their interests shift halfway through the semester. A first-year student who ticks "technology" on day one might spend the next three months gravitating toward entrepreneurship events without ever updating their profile.
So I dropped the profile form entirely and let Hindsight's agent memory do the work instead.
Every student action gets stored as natural language in Hindsight via store_entry() in memory/hindsight.py:
def store_entry(entry: dict):
content = (
f"Student {entry['action']} {entry['target_type']}: {entry['target_name']}. "
f"Category: {entry['category']}. "
f"Time of day: {entry['time_of_day']}. "
f"Additional context: {entry.get('metadata', {})}"
)
return store_memory(
bank_id=entry['student_id'],
content=content
)

No schema migrations. No foreign keys. No profile table. Just natural language pushed into a Hindsight memory bank keyed by student_id. Hindsight handles extraction, normalization, and indexing internally.
Before every response, the agent recalls that memory:
memory = recall_memory(
bank_id=student_id,
query="student interests clubs events attended timing preferences"
)
memory_text = "\n".join([r.text for r in memory.results])

That memory_text goes directly into the system prompt. The LLM sees a rich, evolving description of who this student is — built entirely from their behavior, not their self-reported preferences.

Interest Drift: The Part That Surprised Me
Storing memory is easy. The harder problem is that memory goes stale.
A student who attended three AI workshops in February but hasn't touched a tech event since March shouldn't still be getting tech recommendations in April. Their interest has drifted.
I implemented a decay function in memory/interest_drift.py:
def calculate_interest_score(current_score: float,
days_since_last: int,
interacted: bool) -> float:
decay = 0.95 ** days_since_last
boost = 0.2 if interacted else 0
new_score = (current_score * decay) + boost
return round(min(new_score, 1.0), 3)

Every category score decays by 5% per day without engagement and gets a 0.2 boost when the student interacts with something in that category. It's not sophisticated — no Bayesian updating, no learned decay rates. But it works well enough that recommendations shift meaningfully over a two-week window without any explicit user input.

Silent Avoidance: What Students Don't Say
The avoidance detection in memory/avoidance.py turned out to be more valuable than the interest tracking.
Students never say "I don't like sports." They just never click on sports events. After three consecutive ignores of the same category, the agent flags it as avoided and stops recommending it entirely:
def detect_avoidance(interaction_history: list) -> list:
category_ignore_count = {}
for interaction in interaction_history:
if interaction.get("action") == "ignored":
cat = interaction.get("category")
category_ignore_count[cat] = category_ignore_count.get(cat, 0) + 1
avoided = [
category for category, count
in category_ignore_count.items()
if count >= 3
]
return avoided

This is the feature that made the two-student demo feel real. Student B never said they disliked technology. They just never attended a tech event. By Day 4, the agent had stopped recommending tech entirely — not because we told it to, but because Hindsight's memory contained enough behavioral signal to make the inference automatic.

The Two Student Output

Here is the actual output from running both profiles against the same question:

STUDENT A — Tech/AI Focused

[PASTE YOUR STUDENT A OUTPUT HERE]

==================================================

STUDENT B — Music/Arts Focused

[PASTE YOUR STUDENT B OUTPUT HERE]

Same campus data. Same events list. Same question. The only difference is what Hindsight remembered about each student.

Deadline Collision Detection
One feature that emerged naturally from having memory was deadline awareness. The agent cross-references all upcoming deadlines in memory/deadlines.py and raises a collision warning when two or more deadlines fall within a three-day window:
collision = len([d for d in upcoming if d["days_left"] <= 3]) >= 2

When collision is true, the system prompt instructs the LLM to lead with urgency rather than recommendations. In testing, this single feature produced the most viscerally useful responses — students asking casual questions getting proactively warned about deadline collisions they hadn't thought about yet.

What Hindsight Got Right That I Would Have Built Wrong
If I had built memory myself I would have used a vector database, stored embeddings, written retrieval logic, and spent a week debugging why similar events weren't surfacing correctly.
Hindsight runs four retrieval strategies in parallel — semantic, keyword, graph, and temporal — and merges them with reciprocal rank fusion before returning results. I got all of that for three lines of code:
client = Hindsight(
base_url="https://api.hindsight.vectorize.io",
api_key=HINDSIGHT_API_KEY
)
result = client.recall(bank_id=student_id, query=query)

The temporal retrieval specifically was something I would never have built myself in a hackathon timeline. Hindsight correctly surfaces that a student attended an event "last Tuesday evening" when the query is time-relative. That context flows into the system prompt and makes responses feel grounded in actual history rather than vague profile data.

Lessons Learned
Let behavior define the profile, not the form. Explicit preference inputs are noisy and stale. Behavioral signals stored in Hindsight produced richer, more accurate profiles with zero user friction.
Memory staleness is a first-class problem. Storing memory is trivial. Knowing when memory is no longer relevant is the hard part. The decay function is simple but necessary — without it, early interactions dominate forever.
Duplicate memory accumulates fast. Running the same test multiple times filled Hindsight with redundant entries. In production, check for semantic similarity before storing or Hindsight's recall results will be dominated by repeated observations.
The system prompt is where memory becomes intelligence. Hindsight recall gives you raw facts. The system prompt is where you decide which facts matter, in what order, and how the LLM should weight them. Spending time on prompt structure paid off more than any other single change.
Two memory banks beat one shared context. Keeping each student's memory in a separate bank_id meant isolation was free. No filtering logic, no user-scoping queries, no risk of one student's data bleeding into another's recommendations.

What's Next
The current implementation extracts interests from memory text using keyword matching — functional but brittle. The next step is using Hindsight's reflect() operation to generate disposition-aware summaries of each student's profile, which the agent can use directly without any keyword extraction logic.
The campus data is still static. Once the backend connects live database queries, the event ranking will work against real-time data and the personalization loop will be complete.
The foundation is solid. Memory is the feature — everything else is plumbing.

Top comments (1)

Adarsh Kant • Mar 23

This resonates deeply. The "every system starts from zero" problem is everywhere — not just campus assistants but across the entire web experience.

We're tackling a similar challenge with AnveVoice (anvevoice.app) — a voice AI agent that takes real DOM actions on websites (clicking buttons, filling forms, navigating pages). The memory layer matters enormously when your agent is actually executing actions rather than just generating text. If a user asks the same thing twice, the agent needs to remember not just what they asked, but what actions it took and what context made those actions correct.

Your approach of building profiles through behavior rather than explicit preferences is the right call. Users rarely fill out preference forms, but their interaction patterns reveal everything. We found the same thing — tracking which features users invoke by voice gave us better personalization signals than any onboarding survey ever could.

Curious about your vector similarity threshold for matching. How are you handling the cold-start problem for brand new students with no behavioral history?