An Engineer’s Diary: Building a Proactive Campus AI with Structured Memory
“Why is it recommending robotics again?” I checked the logs, and that’s when I realized the system wasn’t repeating itself — it was learning from a pattern I hadn’t even noticed.
That moment reframed the entire project for me. We hadn’t set out to build something that learns. We set out to build a better campus dashboard. But somewhere between student_history.json and a ten-line drift detector, we accidentally built an agent that knew a student’s interests better than she had articulated them herself.
What Smart Campus Actually Does
Smart Campus is a Streamlit-based student assistant. The surface looks simple: a dashboard, an event feed, a map, a chat interface. But the interesting part is what happens before the student types anything.
When the app loads, it generates a Morning Briefing — a personalized, proactive summary of today’s relevant events, upcoming deadlines, and opportunities. Nobody requested it. The agent produced it by reading the student’s full history and cross-referencing it against today’s campus data. That behavior — acting before being asked — is the entire thesis of the project.
The stack is deliberately minimal: Python, Streamlit for the UI, Gemini 1.5 Flash or GPT-4o-mini as the LLM backend, and Hindsight for agent memory as the layer that makes proactive behavior possible without bloating every prompt with raw data.
The Memory Schema Came Before the Prompts
The first mistake I made — and fixed — was writing LLM prompts before designing the memory schema.
The initial version stuffed the entire student_history.json into the system prompt as raw JSON. Token count ballooned. The LLM started hallucinating connections between unrelated events because it was drowning in undifferentiated data. Response latency doubled. It was the classic mistake: treating memory as a context window problem instead of a retrieval design problem.
The interest_timeline field is the clearest example of schema-first thinking:
"interest_timeline": [
{ "period": "2022-2023", "dominant": "Robotics", "secondary": "Drones" },
{ "period": "2023-2024", "dominant": "Drones", "secondary": "Robotics" },
{ "period": "2024-2025", "dominant": "FinTech/Trading", "secondary": "Drones" }
]
This isn’t a log. It’s a summarized trajectory. Each period captures the dominant and secondary interest based on the events attended in that window. The shape of this data is what makes drift detection possible in ten lines of Python.
The Drift Detector
The drift detection logic is embarrassingly simple, which is exactly why it works:
def detect_drift(student):
tl = student.get('interest_timeline', [])
if len(tl) < 2:
return {'drift': False, 'old': None, 'new': None}
old, new = tl[-2], tl[-1]
return {
'drift': old['dominant'] != new['dominant'],
'old': old['dominant'],
'new': new['dominant']
}
What made this non-trivial wasn’t the detection — it was deciding what to do with it. Surfacing the drift explicitly felt surveillance-like. One tester literally said “that’s creepy.” The agent was right, but being right in a way that made people uncomfortable wasn’t useful.
The fix was a single constraint added to the system prompt when drift is detected:
drift_note = (
f"\nINTEREST DRIFT: {drift['old']} -> {drift['new']}. "
"Silently adapt all recommendations. Do NOT announce the drift."
if drift["drift"] else ""
)
How Recall Works: Scoring Events Against Memory
The other half of the memory system is relevant_events() — the function that decides which of today’s campus events are worth surfacing to this specific student:
def relevant_events(student, campus):
skills = student.get('skills', {})
all_int = (
[e['domain'] for e in student.get('events_attended', [])] +
skills.get('primary', []) + skills.get('emerging', [])
)
attended = [e['event'] for e in student.get('events_attended', [])]
scored = []
for ev in campus.get('events', []):
score = sum(2 for t in ev.get('relevance_triggers', [])
for i in all_int
if t.lower() in i.lower() or i.lower() in t.lower())
score += sum(3 for t in ev.get('relevance_triggers', [])
for a in attended
if t.lower() in a.lower())
if score > 0:
scored.append((score, ev))
scored.sort(key=lambda x: x[0], reverse=True)
return [e for _, e in scored]
Past wins and attended events score at weight 3 versus general interests at weight 2. This isn’t arbitrary — a student who won a hackathon has demonstrated real commitment in that domain. That signal deserves more weight than a skill tag someone filled in on a form.
The System Prompt as Curated Memory, Not Data Dump
The temptation with rich student data is to include everything. Resist it. The build_system_prompt() function is deliberately sparse — no raw JSON, no full event history. Just the curated, relevant summary: who this student is, what they’ve done, what drift exists, what’s relevant today.
This is the core principle behind structured agent memory: the value isn’t in storing everything — it’s in recalling the right things at the right time in the right shape.
What It Actually Looks Like at Runtime
When Aditi opens Smart Campus on a Tuesday morning, this is what the agent generates without a single keystroke:
Good morning, Aditi! AI-curated briefing for 2025-07-15
🏆 Your SIH 2025 Hackathon win is opening doors — check the Innovation Mixer today.
📅 Innovation Lab Networking Mixer · 5:00 PM · Block C
📅 FinTech Hackathon Registration Deadline · 11:59 PM · Online
📅 Python for Quant Finance · 6:00 PM · CS Lab 3
⚠️ FinTech Hackathon Team Submission — due today
Before adding the interest_timeline structure, the agent would surface the Robotics Build Session every single day. With the timeline, the agent understands that Robotics is historical and FinTech is current. Same data, completely different behavior, just from adding temporal structure to the memory schema.
Lessons Worth Keeping
• Design the memory schema before writing a single prompt. The data shape determines what’s queryable. interest_timeline as a period-by-period object unlocked drift detection in ten lines.
• Silent behavioral adaptation beats announced behavioral adaptation. “I noticed you’re interested in FinTech now” is creepy. Just starting to recommend FinTech events is smart.
• Weight signal quality, not just signal presence. A hackathon win is a stronger signal than a skill tag. Encoding that distinction in the scoring function (3 vs 2) meaningfully changed recommendation relevance.
• Curated recall beats full context injection. Sending the entire student profile to the LLM every time degraded quality and increased latency. The selection logic is the intelligence — the LLM is just the reasoning layer on top.
• The proactive surface was the product. Nobody asked for the Morning Briefing. When we added it, it became the feature everyone wanted to show in demos. Agents that wait to be asked are less compelling than agents that already know.
The missing layer between “chatbot with history” and “agent that actually knows you” is structured memory managed by something like Hindsight. The flat JSON approach works at demo scale. For anything real, you want a proper retain/recall cycle.
Resources
• Hindsight on GitHub
• Hindsight Documentation
• Vectorize Agent Memory



Top comments (0)