Our AI PM Remembered Task Owners Without Being Told Twice

#ai #webdev #programming #python

"Wait — it actually remembered that?" Keerthana stared at the screen as our agent recalled a task assignment from a session we'd closed an hour ago, without us mentioning it once.

That moment was the whole point of what we built. And honestly, we didn't fully expect it to work that cleanly.

What We Built

Over 8 hours, our three-person team — Sinchana Nagaraj, Keerthana N, and P Sahana — built an AI Group Project Manager powered by Hindsight agent memory and Groq's LLM. The idea was simple: most AI agents are amnesiac. Every new session, they start from zero. For a group project tool, that's a dealbreaker. You don't want to re-explain who owns what every time you open a chat.

So we built something that remembers.

The stack is straightforward: a FastAPI backend, a browser-based chat UI, Groq's llama-3.3-70b-versatile for the LLM, and Hindsight as the memory layer. The agent can assign tasks, log team decisions, summarise workloads, and answer questions about project status — all grounded in memory it has accumulated across previous sessions.

The Core Loop

The architecture has two moving parts: agent.py handles all Hindsight and LLM logic, and main.py exposes FastAPI routes that the chat UI calls.

Every meaningful interaction goes through the same loop:

Recall relevant memories from Hindsight before calling the LLM
Inject those memories as context into the system prompt
Call the LLM with that context
Retain the interaction back into Hindsight for future recall

Here's what that looks like for a chat message:

async def chat(self, user: str, message: str) -> str:
    # Step 1: recall relevant memories
    context = await self._recall_context(message)

    # Step 2: inject into system prompt
    system = f"""You are an AI Group Project Manager.
Team members: {', '.join(TEAM_MEMBERS)}.

Project memory recalled from Hindsight:
{context}

Use this memory to give accurate, personalised answers."""

    # Step 3: call LLM
    response = self._llm(system=system, user=f"{user} asks: {message}")

    # Step 4: retain this interaction
    await self._retain(
        content=f"Chat from {user}: '{message}' — Agent replied: '{response[:200]}'",
        context="chat"
    )
    return response

The _recall_context method calls Hindsight's arecall() with the user's message as the query. Hindsight runs four search strategies in parallel — semantic, keyword, graph, and temporal — and returns the most relevant memories. Those memories become the LLM's context window for that turn.

What We Actually Store

We retain three categories of memory, each with a descriptive context label:

Task assignments:

await self._retain(
    content=(
        f"Task assigned to {assigned_to}: '{task}'. "
        f"Deadline: {deadline}. Status: PENDING. "
        f"Assigned on: {datetime.utcnow().strftime('%Y-%m-%d')}."
    ),
    context="task assignment"
)

Team decisions:

await self._retain(
    content=(
        f"Team decision by {made_by} on {date}: '{decision}'."
    ),
    context="team decision"
)

Chat history:

await self._retain(
    content=f"Chat from {user}: '{message}' — Agent replied: '{response[:200]}'",
    context="chat"
)

The context label matters more than we initially thought. Hindsight uses it during fact extraction to shape how memories are interpreted. A decision logged as "team decision" is retrieved differently from a task logged as "task assignment" — even if the raw text is similar. This was one of those things that wasn't obvious until we tested it.

The Memory Bank

Hindsight organises memories into banks, identified by a bank_id string you choose. There's no setup required — just pick a name and start calling retain() and recall() against it.

self.memory = Hindsight(
    base_url=HINDSIGHT_BASE_URL,
    api_key=HINDSIGHT_API_KEY,
)

# Anywhere in the agent:
await self.memory.aretain(bank_id="group-project-manager", content=content)
results = await self.memory.arecall(bank_id="group-project-manager", query=query)

The bank persists on Hindsight's side. Which means when you restart your server — or open a new session the next day — the memories are still there. That's the key property that made Keerthana's "wait, it actually remembered" moment possible.

We used Hindsight Cloud to get a hosted instance quickly, which meant zero infrastructure setup. The API key goes in .env and the bank just works.

What Surprised Us

The context label is doing real work. We initially stored everything with a generic context and noticed the recall quality was flatter. Once we split task assignments, decisions, and chat into separate context labels, the agent started pulling the right memories for the right queries much more reliably.

Recall quality scales with what you retain. The first session feels underwhelming — the agent has almost nothing to work with. By the third session, after you've assigned several tasks and logged a few decisions, the project status summary becomes genuinely useful. It references specific people, specific deadlines, and flags things that are still pending. The agent gets smarter as the bank grows.

The async SDK methods matter. We're running inside FastAPI's async event loop, so we used aretain() and arecall() throughout. If you mix sync calls into an async context carelessly, you'll get subtle blocking issues that are annoying to debug. Use the async methods from the start.

The LLM still needs guardrails. Hindsight gives the agent memory, but it doesn't make the LLM more disciplined. We found early on that without explicit instructions to reference memory in the system prompt, the model would sometimes ignore the recalled context entirely and just make things up. The system prompt needs to actively tell the model to use what it's been given.

The Demo Moment

The most satisfying test was this sequence:

Start the server fresh
Assign "Build the frontend UI" to Sinchana, due the next day
Assign "Set up API routes" to Keerthana, due the next day
Log a decision: "We will use React for the frontend" — by Sahana
Stop the server completely
Restart the server
Type: "What tasks are pending?"

The agent replied with both tasks, the correct owners, and the correct deadlines. It also mentioned the React decision in the context of Sinchana's frontend work — a connection we hadn't explicitly asked it to make.

That's Hindsight's observation consolidation doing its job: it doesn't just store raw facts, it synthesises relationships between them. The agent knew that a React decision was relevant to a frontend task because Hindsight had connected those pieces.

Lessons for Your Next Agent

Retain early, retain often. Every meaningful interaction should go into memory. Storage is cheap; missing context is expensive.
Use descriptive context labels. The context parameter shapes how Hindsight extracts facts. Be specific: "task assignment" beats "data".
Recall before every LLM call. Query Hindsight with the user's actual message as the query string — it's usually a better search query than anything you'd construct manually.
Bank IDs are just strings. No setup, no migrations. Pick a meaningful name and go.
Test across restarts early. The cross-session behaviour is the whole point. If you only test within a single session, you'll miss whether the memory is actually persisting.

What's Next

Right now the agent tracks tasks and decisions but has no concept of task completion — you can log a task as done via chat, but there's no structured state transition. That's the obvious next step.

We'd also like to add per-member memory banks so the agent can maintain a richer model of each person's workload history over time, rather than one shared bank for the whole team.

The code is on GitHub: github.com/SinchanaNagaraj/ai-group-project-manager

If you're building an agent that needs to remember things across sessions, Hindsight is worth a serious look. The retain / recall pattern is simple enough to wire in quickly, and the memory quality is meaningfully better than stuffing everything into a system prompt and hoping for the best.