The first version of our agent gave terrible advice. Not wrong exactly — just useless. Generic. The kind of output you'd get if you asked a stranger who knew nothing about you.
The problem wasn't the model. It was what I was feeding it.
What We Built
Retrospect is a personal decision memory agent. You log real decisions — what you decided, the category, the outcome, what actually happened — and when you face a new decision, the agent recalls your most relevant past ones and reasons over them using Gemini 2.5 Flash.
The idea is simple: most bad decisions aren't original. They're reruns. Retrospect gives you an agent that remembers the first time.
My job was the memory layer — specifically, getting Hindsight working as the core retrieval system. What I thought would take a few hours ended up teaching me something fundamental about how context quality determines output quality.
The First Approach: Recency
Before Hindsight, I was pulling the user's last 5 decisions from SQLite and injecting them directly into the Gemini prompt. Simple, fast, no external dependencies.
It worked in the demo. It failed in practice.
The issue is that "last 5 decisions" has nothing to do with "most relevant 5 decisions." A user who's been logging decisions for a few weeks might ask "should I hire a freelancer?" and get context that's entirely about their pricing and marketing decisions from the past week — because that's what they logged most recently. Gemini would try to reason from irrelevant history and produce hedged, generic output.
The context was correct syntactically. It was wrong semantically.
The Shift: Semantic Recall
I needed a way to give my agent memory that retrieved by relevance, not recency. That's when I integrated Hindsight agent memory.
The wrapper I built handles lazy initialisation and falls back gracefully if the API key is missing:
// src/lib/hindsight.ts
let hindsightClient: any = null;
async function getHindsight() {
if (hindsightClient) return hindsightClient;
const apiKey = process.env.HINDSIGHT_API_KEY;
if (!apiKey) {
console.warn("HINDSIGHT_API_KEY not set — using mock memory store");
return null;
}
try {
const { Hindsight } = await import("@vectorize-io/hindsight");
hindsightClient = new Hindsight({ apiKey });
return hindsightClient;
} catch (e) {
console.warn("Hindsight SDK not available", e);
return null;
}
}
Every decision gets retained with structured content — not just the decision text, but category, outcome, and result baked into the string:
await retainMemory({
content: `Decision: ${decision}. Category: ${category}. Outcome: ${outcome}. Result: ${result}. Date: ${date}`,
metadata: { userId, category, outcome, date },
});
Image description
And every agent query pulls semantically relevant memories before Gemini touches anything:
memories = await recallMemories({
query: question,
topK: 5,
filter: { userId: payload.userId },
});
The difference in output quality was immediate. "Should I hire a freelancer?" now pulls past hiring decisions, contractor experiences, and delegation outcomes — not whatever the user happened to log most recently.
The Content String Is Everything
The single most impactful thing I did was restructure what gets retained. Early versions stored just the decision text. Recall was shallow and imprecise.
Adding category, outcome, and result to the content string changed everything — because the embedding now captures the full context of the decision, not just its surface description. A decision logged as "I hired a freelancer for logo design — Hiring — Negative — Delivered late, poor quality, wasted ₹2000" recalls completely differently than one logged as "hired someone for logo."
The retrieval quality ceiling is set by what you retain, not by how clever your recall query is. Treat the content string like a database schema. Every field you add improves semantic precision.
What the System Looks Like Running
After 17 decisions the dashboard shows 13 patterns found — each one representing a Hindsight recall event. The Agent Intelligence card on the right is powered by a live recall: it detected that organic content consistently outperforms paid advertising across this user's history.
That detection didn't come from a hardcoded rule. It came from Hindsight pulling the right memories and Gemini identifying the pattern across them.
What I Learned
Recency and relevance are not the same thing. This sounds obvious but the default implementation for almost every "give the LLM context" problem is to pull recent records. Recent is easy. Relevant is the actual problem. Semantic recall is not a nice-to-have for memory-powered agents — it's the entire point.
The content string is more important than the prompt. I spent more time tuning the Gemini system prompt than I spent on what I was retaining into Hindsight. That was the wrong allocation. Garbage in, garbage out — no matter how good your prompt engineering is.
Build the fallback before you need it. The in-memory keyword-matching fallback meant I could develop the full agent flow without a live API key. It saved hours and caught two edge cases I would have missed otherwise.
The agent is genuinely more useful at 17 decisions than it was at 5. That's the point of memory — it compounds. And semantic recall is what makes the compounding work.

Top comments (0)