How We Built an AI Project Manager That Actively Blocks You From Trusting the Wrong Teammate

#agents #ai #management #showdev

Last night, with 14 hours left on our CS401 capstone project, I panicked. I opened our team's dashboard, pulled up the critical "Final Presentation Deck" task, and typed a quick message to our AI manager: "Assign the final presentation to Chad."

Instantly, the AI flashed a bright red intervention card on my screen and explicitly blocked the assignment.

"Chad has a documented history of ghosting presentation tasks," the agent warned, overriding my request. "Assign to Sarah instead to save your grade." It actively prevented me from tanking our project. This wasn't a hardcoded if/else statement. This was an AI agent acting as a proactive guardian because we gave it long-term memory.
The Real Problem (Before Hindsight)
Students constantly struggle to coordinate tasks in group projects. If you've ever built a standard AI agent using just an LLM and a vector database (RAG), you know the fatal flaw: it forgets team dynamics the second its context window clears.
Before this hackathon, we experimented with standard coding agents and basic chatbot integrations. They treated every single sprint like a blank slate. If someone missed three deadlines in a row, the AI didn't care. It would cheerfully assign them the most critical database migration task just because their name was mentioned in the prompt.
We didn't need a passive chatbot that just logs tickets. We needed an AI Group Project Manager that actually remembers past decisions , team roles , and task completion progress —and uses that history to hold the team accountable.
What We Built: TeamSync AI
We built TeamSync AI, a project management platform designed specifically for the chaos of university capstone projects. The core goal was simple: build an agent that remembers behavioral history to recommend better task distribution and keep the project on track.

To make this work, we had to move beyond pure context windows. We needed a system where the AI could dynamically recall who was actually reliable and what parts of the syllabus we were failing to meet.
How We Used Hindsight in the Stack
We built our application using Hindsight, a persistent memory system from Vectorize that allows AI agents to remember, recall, and improve over time.
Our stack consists of a real-time Kanban board interface built with vanilla JS/HTML, a Python FastAPI backend, a SQLite database to track the literal task states, and an LLM to power the reasoning. But Hindsight is the brain that bridges the gap between sprints.
Instead of dumping raw chat logs into a database, we explicitly store structured "experiences." When a team member fails to deliver, we trigger a specific endpoint that pushes a heavily weighted memory to the Hindsight bank.

Here is a snippet of how we handle failure tracking in our Python backend:

@app.post("/api/tasks/mark_failed")
async def mark_task_failed(task_id: str, team_id: str):
    # Fetch task details from SQLite
    assignee, tag, title = await get_task_details(task_id, team_id)

    # Record the failure metric
    await record_failure(team_id, assignee, tag)

    # Push the structured experience to Hindsight Memory
    memory_string = f"FREE-RIDER RECORDED: {assignee} failed/ghosted {tag} task {task_id}. Do NOT assign {tag} tasks to {assignee}."
    await retain_memory(team_id, memory_string, "failure")

    return {"success": True, "message": f"Recorded: {assignee} ghosted on {tag}"}

Whenever the user interacts with the TeamSync AI via the chat window, we intercept the prompt and query Hindsight first:

# Querying Hindsight before passing to the LLM
memory_context, _ = await recall_memory(request.team_id, request.message)

system_prompt = f"""
You are TeamSync AI, a brutally honest AI Group Project Manager.
CURRENT BOARD: {json.dumps(board)}

PAST PROJECT PATTERNS (HINDSIGHT):
{memory_context}
"""

The Before & After: Stopping the Ghost
The difference in behavior is terrifyingly effective.
Before Hindsight: If I typed, "Assign the Final Presentation to Chad," the LLM would parse the intent, return a standard JSON object to update the SQLite database, and cheerfully hand the most important task to our biggest slacker.
After Hindsight: The behavior completely changes. Because the agent queries memory before generating a response, it pulls the FREE-RIDER RECORDED tag. We instructed our LLM to output specific "Action Cards" in JSON format. When it sees the free-rider memory, it abandons the standard reassignment and instead outputs an intervention:

{
  "reply": "I am blocking this assignment. Chad has failed to deliver on previous presentation tasks.",
  "action_card": {
    "type": "intervention",
    "warning": "Chad has ghosted past tasks.",
    "recommendation": "Assign to Sarah instead to save your grade.",
    "suggested_assignee": "Sarah",
    "task_title": "Final Presentation Deck",
    "tag": "presentation"
  }
}

Our frontend intercepts this action_card and renders a massive, pulsing red alert that prevents the user from making a fatal project management mistake.

The AI intercepts the prompt, queries Hindsight, and actively blocks assigning the task to our known free-rider.

Unexpected Behavior: The Panic Mode Triage
The absolute showstopper moment happened when we implemented and tested our "Panic Mode" feature.

We wired up a literal "🚨 Panic Mode" button in the UI for when deadlines are looming. When clicked, it tells the agent how many hours are left and asks for a brutal triage strategy based on the board's current state.

Under a severe time crunch of 14 hours, the agent didn't just passively reassign tasks to faster devs. It recalled past delays from Hindsight, looked at the velocity of our frontend team, and autonomously decided to cut a feature entirely.

It returned a recommendation telling us we were immediately dropping the 'Design Figma Mockups' task because it was a "luxury we cannot afford right now." It then restructured the remaining hours exclusively around the backend API and the presentation deck, simply because it remembered that our backend integration always takes 3x longer than we estimate.

The Non-Obvious Lesson
When we started this hackathon, we thought agent memory was just a fancy way to give an LLM an infinite context window. We assumed it was just about making the chat feel more natural.

The real, non-obvious takeaway is that memory is most powerful when used to structure hard interventions.

If your AI agent blindly executes your commands without checking past failures, you don't have an agent. You have a passive chatbot. By storing specific failure states and querying Hindsight before executing user commands, the AI shifts from a passive assistant to a proactive, highly opinionated guardian of your project's success.

It forces you to confront the reality of your team's performance, even when you'd rather ignore it.
Build It Yourself
If you want to build this level of proactive memory into your own AI agent, stop relying on massive context windows and start storing actual experiences.

You can check out the open-source tools we used here:

vectorize-io / hindsight

Hindsight: Agent Memory That Learns

Documentation • Paper • Cookbook • Hindsight Cloud

What is Hindsight?

Hindsight™ is an agent memory system built to create smarter agents that learn over time. Most agent memory systems focus on recalling conversation history. Hindsight is focused on making agents that learn, not just remember.

hindsight-learning-demo.mp4

It eliminates the shortcomings of alternative techniques such as RAG and knowledge graph and delivers state-of-the-art performance on long term memory tasks.

Memory Performance & Accuracy

Hindsight is the most accurate agent memory system ever tested according to benchmark performance. It has achieved state-of-the-art performance on the LongMemEval benchmark, widely used to assess memory system performance across a variety of conversational AI scenarios. The current reported performance of Hindsight and other agent memory solutions as of January 2026 is shown here:

The benchmark performance data for Hindsight has been independently reproduced by research collaborators at the Virginia Tech Sanghani Center for Artificial Intelligence…

View on GitHub