Rakshitha H K

Posted on Mar 23

We Replaced Our Standup Notes With an AI That Actually Remembers Them

#agents #ai #productivity #showdev

We built a group project manager that learns from every sprint decision. I didn't believe the memory was actually doing anything — until the agent started justifying assignments with history I'd forgotten I'd logged.

That's the part that got me. Not that it remembered. That it remembered things I had already forgotten.

The Standup Problem Nobody Talks About

Every team has a version of this: you hold a standup, someone mentions a blocker, someone else says they're close to done, the coordinator makes a note. Then the meeting ends, the note goes into a doc nobody re-reads, and three days later you're in the next standup repeating half the same conversation because nobody remembered what was decided.

We weren't trying to fix standups. We were building an AI project manager for a hackathon — a Streamlit app that could answer natural-language questions about our three-person team: Dexter (AI/Backend), Rahul (Frontend), and Rakshitha H K(Project Coordinator). Ask it who should own a task, get back a justified recommendation. Simple enough.

The problem appeared almost immediately. The agent was good at reasoning — Groq's qwen/qwen3-32b is surprisingly capable when you give it clean context. But it had the memory of a goldfish. Every new session, everything we'd discussed in the last standup was gone. It was making decisions from a frozen snapshot of the team — roles, names, current task titles — with none of the actual history that makes a recommendation trustworthy.

It kept recommending Dexter for backend tasks. Technically correct. Also completely blind to the fact that Dexter had been stuck on the same pipeline for two weeks and was already at risk of missing the sprint deadline.

That's not a PM. That's a name randomizer with extra steps.

What We Actually Wanted

We wanted the agent to behave like a PM who had been sitting in every standup for the last month. Not one who had read the meeting notes — one who had been there, noticed the patterns, and could connect the dots without being asked.

The difference matters. A PM who read the notes knows Dexter is working on the fine-tuning pipeline. A PM who was there knows that Dexter flagged scope creep two weeks ago, that the estimate slipped once already, and that adding anything to his plate right now is a bad idea regardless of what his role description says.

That kind of reasoning requires memory that persists across sessions, accumulates over time, and can be queried semantically — not a bigger system prompt you manually update before each conversation.

That's what led us to Hindsight. It's an open-source agent memory layer built for exactly this: retain experiences, recall relevant ones, let the agent learn from its own history.

How We Plugged It In

The integration is simpler than it sounds. Hindsight sits as a layer between the user's message and the LLM call. Before we hit Groq, we query the memory bank. After Groq responds, we write the decision back.

Here's the full loop in app.py:

def run_hindsight_loop(user_message: str, groq_client) -> tuple[str, str]:
    # 1. Recall — pull relevant history before touching the LLM
    recall_query = f"team member performance and task history relevant to: {user_message}"
    memories = recall_team_memory(recall_query)

    # 2. Inject — memories go directly into the system prompt
    messages = build_agent_prompt(user_message, memories)

    # 3. Decide — run the LLM with memory-enriched context
    ai_response = run_agent(groq_client, messages)

    # 4. Retain — write this decision back into Hindsight
    retain_interaction(user_message, ai_response)

    return ai_response, snippet

The memory injection happens in build_agent_prompt — the retrieved memories drop into a ## Relevant Team Memory block in the system prompt. The LLM sees it as context, not as a special instruction. It just… uses it.

What we store per interaction:

record = (
    f"Project manager decision — User request: '{user_message}' | "
    f"AI recommendation: '{ai_safe[:400]}'"
)
client.retain(
    bank_id=HINDSIGHT_BANK,
    content=record,
    context="project-manager-decision",
    timestamp=timestamp,
)

Every PM decision becomes an experience: what was asked, what was recommended, when. The memory bank is called project-manager-v1. Queries are natural language — we don't tag or categorize records manually. Hindsight's semantic retrieval handles relevance.

One implementation detail that cost us an hour: don't cache the Hindsight client in Streamlit. Shared clients hit asyncio.timeout() errors from Streamlit's running loop. We fixed it by creating a fresh local client inside every recall and retain call. Tiny overhead, no more debugging.

The Hindsight docs and Vectorize's agent memory page have more detail on how the retrieval layer works under the hood.

The Moment I Stopped Being Skeptical

About a day in, I typed: "Who should own the API integration task?"

The agent's response came back with a recommendation for Dexter — but then it did something I wasn't expecting. It added a caveat. It noted that based on prior context, Dexter's current pipeline task had already taken longer than estimated, and that adding a second backend task this sprint would likely create a bottleneck. It suggested checking his current completion percentage before confirming the assignment.

I stared at that for a second. Then I scrolled back through my own memory of what we'd discussed over the last day of building.

We had talked about Dexter's timeline slipping. I had asked the agent about it in a previous session. The agent had stored that conversation — and now, three sessions later, it was citing it back to me as evidence.

I had literally forgotten I'd logged that concern. The agent hadn't.

That's the moment I stopped treating Hindsight as a demo feature and started thinking of it as load-bearing.

What the App Looks Like

The Streamlit interface has two main areas. The sidebar shows each team member as a card — name, role, current task, status badge (Pending / In Progress / Done), and a progress bar. Below the team cards is a Sprint Stats section and a live Hindsight memory status indicator showing whether the memory bank is active.

![The AI Group Project Manager welcome screen — sidebar shows team member cards with task status, main area has quick-action suggestion buttons]

The welcome screen. The sidebar is the static layer — roles and current tasks. Everything the agent learns over time lives in Hindsight, invisible here but active in every recommendation.

The main chat area is a standard message interface. You type a question, the agent responds. What's different is the "Memory Note" at the bottom of each response — the agent explicitly flags what it's writing back to Hindsight from this interaction.

![The AI PM responding to a task assignment request — recommends Rahul for the login UI with reasoning, ends with a Memory Note about the decision]

Every response ends with a Memory Note. The agent isn't just answering — it's deciding what to remember. That feedback loop is what makes the recommendations improve over time.

The Sidebar Tells You What the Agent Can't See (Yet)

Here's the useful way to think about what Hindsight adds. The sidebar shows the current state: who's working on what, at what completion percentage, with what status.

![The team sidebar closeup — Dexter In Progress on model fine-tuning pipeline, Rahul Pending on dashboard redesign, Sohan Done on retrospective]
![ ]

The sidebar shows the frozen snapshot. Hindsight adds the timeline — why Dexter's task has been in progress for two weeks, what Rahul mentioned about his sprint last session, what Sohan flagged as a blocker.

What the sidebar can't tell you: why Dexter's been at 65% for two weeks. Whether that CSS conflict Rahul mentioned last session ever got resolved. What commitments Sohan made in the last standup that might affect her bandwidth.

That's the timeline. That's what standups capture and what nobody ever reads again. Hindsight keeps it — and the agent can actually use it.

One Dead End Worth Knowing About

We initially tried to get clever with the recall query. Instead of a natural-language description of the current request, we were constructing structured queries with explicit field names — trying to match against the way we stored records.

It didn't work better. It worked worse. The recall query and the stored content were in different "shapes" and the semantic similarity scores were lower than when we just used plain language on both ends.

The fix was counterintuitive: keep both the recall query and the stored record as natural, readable sentences. Don't over-engineer the structure. The retrieval works best when the query sounds like something a human would ask and the stored content sounds like something a human would write.

We spent half a session optimizing the wrong thing. If you're building with Hindsight: write your records the way you'd write a meeting note, and query the way you'd ask a colleague.

The Non-Obvious Takeaway

The thing we kept expecting was for Hindsight to surface big, dramatic remembered facts. "You said Dexter was struggling three weeks ago." That kind of explicit recall.

What actually happened was subtler and more useful. The agent's confidence calibration changed. It started hedging more appropriately on assignments where history suggested risk, and recommending more confidently on assignments with a clean track record. It wasn't citing specific memories in every response — it was using them to modulate how certain it sounded.

That's the thing nobody writes about when they talk about agent memory. It's not just about the agent knowing more facts. It's about the agent knowing when to be less sure.

A project manager who's never been wrong is useless. One who's been paying attention — and learned from it — is the one you actually want making the call.

If you want to build something similar, the Open Source Agent Memory layer we used is on GitHub. The integration is lighter than you'd expect.

Full project code: https://github.com/Sohan4-c/hindsight-manager