DEV Community

Cover image for My agent remembered the whole meeting and still forgot the 5 parts that mattered
Lars Winstand
Lars Winstand

Posted on • Originally published at standardcompute.com

My agent remembered the whole meeting and still forgot the 5 parts that mattered

A lot of agent demos sell the same fantasy:

“Your agent joins the call, remembers everything, and helps with follow-up later.”

Cool demo. Bad memory model.

I’ve been digging into meeting-memory workflows for agents, and the pattern is pretty consistent: storing the full transcript as “memory” sounds smart, but usually makes the system worse.

The best meeting memory is usually not the transcript.

It’s a compact record of:

  • decisions
  • commitments
  • constraints
  • deadlines
  • open questions

That’s the stuff that survives into future work.

Everything else is archive.

The transcript is not memory

A transcript feels like memory because it contains everything.

That’s exactly the problem.

If you dump 8,000 words from a Zoom call into OpenClaw, an n8n workflow, a Zapier automation, or a custom agent using the OpenAI Responses API, you haven’t created useful long-term memory.

You created a blob.

And blobs are expensive.

They also break retrieval in subtle ways:

  • irrelevant tangents get pulled into later runs
  • context windows fill up with stale details
  • the model latches onto the wrong part of the conversation
  • follow-up drafts sound informed but miss the actual decision

For most automation workflows, the useful output of a meeting is boring and structured:

  • what was decided
  • who owns what
  • when it’s due
  • what constraints were stated
  • what’s still unresolved
  • what durable preferences matter later

That is active memory.

The transcript is evidence.

Those are not the same thing.

What your agent needs a week later

This is the only question that matters:

What information survives into the next task without poisoning unrelated runs?

If I open an agent next Tuesday to draft a client follow-up, I do not want the full conversational replay.

I want this:

  1. The decision
  2. The commitments and owners
  3. The deadline
  4. The constraint that changes execution
  5. The unresolved blocker

That’s the memory object.

Not the whole meeting.

Here’s the wrong shape:

{
  "memory": "full 45-minute transcript with side conversations, jokes, false starts, and abandoned ideas"
}
Enter fullscreen mode Exit fullscreen mode

Here’s the useful shape:

{
  "meeting_id": "2026-05-18-client-sync",
  "decisions": [
    "Use quarterly rollout instead of monthly"
  ],
  "commitments": [
    {
      "owner": "Alicia",
      "task": "Send revised pricing",
      "due_date": "2026-05-21"
    }
  ],
  "constraints": [
    "Client cannot use Google Workspace add-ons"
  ],
  "open_questions": [
    "Need legal approval for data retention terms"
  ],
  "preferences": [
    "Client prefers implementation updates by email, not Slack"
  ]
}
Enter fullscreen mode Exit fullscreen mode

Small. Cheap. Reusable.

That’s what you want to retrieve later.

Most memory systems fail in the plumbing first

The funny part is that teams often debate vector DBs and memory graphs before fixing the basic workflow.

But production failures usually happen lower down.

A few examples from real-world agent tooling discussions:

  • context limits get exceeded because the system keeps appending history forever
  • retrieval depends on a human click somewhere in the loop
  • notes live in tools that aren’t automation-safe
  • the wrong chunk gets pulled in because everything is semantically similar
  • stale meeting details leak into a new task and the model treats them as current

That’s not a model intelligence issue.

That’s a systems design issue.

If your memory strategy is basically “keep shoving more text into context until something catches fire,” your agent does not have long-term memory.

It has a delayed failure mode.

Why storing everything gets expensive fast

Because every extra chunk of text becomes future token debt.

This compounds in agentic workflows:

  • one retrieval call becomes three
  • one follow-up draft becomes a transcript + summary + reranker + final generation
  • one client record becomes six weeks of stale context getting dragged into every task

And if you’re paying per token, bad memory design turns directly into cost anxiety.

You start asking questions like:

  • Should this workflow really run every time?
  • Can we afford richer follow-up?
  • Do we need to truncate memory harder?
  • Why did a simple CRM update suddenly cost 10x more?

That’s one reason this problem matters so much for teams building AI agents in n8n, Make, Zapier, OpenClaw, or custom pipelines.

The architecture choice is not just about quality.

It’s also about whether the workflow stays predictable under load.

If you’re running a lot of agent calls, this is exactly where flat-rate infrastructure gets interesting. Standard Compute is built around that reality: OpenAI-compatible API access, but without the usual per-token panic, plus routing across models like GPT-5.4, Claude Opus 4.6, and Grok 4.20. That doesn’t fix bad memory architecture for you, but it removes a lot of the billing pain when you’re iterating on agent workflows.

The memory model that actually works

The pattern I keep coming back to is simple:

  1. Keep the full transcript as archive
  2. Extract structured facts immediately after the meeting
  3. Retrieve only the structured facts by default
  4. Fall back to the transcript only when needed

That’s it.

Here’s the comparison:

Memory style What happens later
Raw transcript memory High recall, low precision; large context cost; irrelevant details leak into future runs
Structured meeting memory Stores decisions, owners, deadlines, and constraints with low context footprint
Searchable archive plus extracted memory Keeps full transcript outside active context and retrieves only when needed

The third option wins most of the time.

A practical extraction pipeline

If I were building this today, I’d make the workflow explicit.

Step 1: store the transcript

Put the raw meeting transcript somewhere durable.

Examples:

  • S3
  • Postgres
  • Supabase Storage
  • a document store
  • your CRM as an attachment

Step 2: extract durable facts

Run a post-processing step that converts the transcript into structured memory.

Example prompt:

Extract only durable follow-up facts from this meeting transcript.

Return JSON with:
- decisions
- commitments { owner, task, due_date }
- constraints
- open_questions
- durable_preferences

Rules:
- Ignore small talk, false starts, and abandoned ideas
- Do not infer commitments unless explicitly stated
- Do not include temporary details unless they affect future work
- If something is uncertain, put it in open_questions instead of decisions
Enter fullscreen mode Exit fullscreen mode

Step 3: validate the output

Do not trust free-form extraction blindly.

At minimum:

function validateMeetingMemory(memory: any) {
  if (!Array.isArray(memory.decisions)) throw new Error("decisions must be an array");
  if (!Array.isArray(memory.commitments)) throw new Error("commitments must be an array");
  if (!Array.isArray(memory.constraints)) throw new Error("constraints must be an array");
  if (!Array.isArray(memory.open_questions)) throw new Error("open_questions must be an array");
  return memory;
}
Enter fullscreen mode Exit fullscreen mode

Better: use JSON Schema or Zod.

import { z } from "zod";

const CommitmentSchema = z.object({
  owner: z.string(),
  task: z.string(),
  due_date: z.string().optional()
});

const MeetingMemorySchema = z.object({
  meeting_id: z.string(),
  decisions: z.array(z.string()),
  commitments: z.array(CommitmentSchema),
  constraints: z.array(z.string()),
  open_questions: z.array(z.string()),
  preferences: z.array(z.string()).optional()
});
Enter fullscreen mode Exit fullscreen mode

Step 4: retrieve selectively

When the next task starts, pull the structured memory first.

Only hit the full transcript if the task actually needs nuance.

Pseudo-flow:

new task arrives
  -> identify related project/client
  -> fetch structured meeting memory
  -> inject only relevant facts into prompt
  -> if ambiguity remains, search transcript archive
  -> generate output
Enter fullscreen mode Exit fullscreen mode

Example: bad vs good follow-up prompt

Bad

Here is the full transcript from last week's 45-minute client call.
Use it to draft a follow-up email.
Enter fullscreen mode Exit fullscreen mode

Better

Draft a follow-up email using these meeting artifacts:

Decisions:
- Use quarterly rollout instead of monthly

Commitments:
- Alicia will send revised pricing by 2026-05-21

Constraints:
- Client cannot use Google Workspace add-ons

Open questions:
- Legal approval needed for retention terms

Tone:
- concise and implementation-focused
Enter fullscreen mode Exit fullscreen mode

The second prompt is cheaper, cleaner, and usually better.

What should count as long-term memory?

My rule is simple:

Store only facts that improve future actions outside the original meeting.

Usually keep these:

  • Decisions: approved, rejected, changed
  • Commitments: owner, task, deadline
  • Constraints: legal, budget, vendor, security, technical
  • Durable preferences: communication style, tooling preferences, review expectations
  • Project facts: systems, dependencies, definitions, stakeholders
  • Open questions: unresolved blockers that matter later

Usually do not keep these as active memory:

  • full back-and-forth conversation
  • speculative ideas that were never adopted
  • one-off anecdotes
  • emotional interpretation presented as fact
  • temporary details with no follow-up value

That last list is where a lot of “smart memory” systems quietly go off the rails.

Layered memory is the right compromise

There are real cases where transcripts matter.

Executive assistant agents, recruiting workflows, account management, research agents, and other relationship-heavy systems sometimes need nuance that a checklist can’t capture.

Fine.

Still don’t use the transcript as default active memory.

Use layers:

Layer 1: raw archive

Full transcript, recording, notes.

Layer 2: structured facts

Decisions, commitments, constraints, preferences, blockers.

Layer 3: compact reusable summary

A short high-confidence summary for recurring context.

For example:

{
  "client_profile": {
    "communication_style": "prefers concise implementation updates",
    "tooling_constraints": ["cannot use Google Workspace add-ons"],
    "approval_rules": ["legal reviews retention language"]
  }
}
Enter fullscreen mode Exit fullscreen mode

That gives you nuance without turning every future run into transcript archaeology.

If you use n8n, Make, Zapier, or OpenClaw

This is where the advice gets very practical.

If your agent is part of an automation stack, optimize for machine-friendly state.

That means:

  • structured JSON over giant summaries
  • deterministic retrieval over “maybe semantic search finds it”
  • explicit IDs for meetings, projects, and clients
  • schema validation before persistence
  • archive and active memory stored separately

A simple pattern looks like this:

meeting recorded
  -> transcript generated
  -> extraction step creates meeting_memory.json
  -> JSON stored in DB keyed by client_id + project_id + meeting_id
  -> future workflow fetches only matching memory records
  -> transcript searched only on fallback
Enter fullscreen mode Exit fullscreen mode

If you skip this and just keep appending transcripts to context, the system may look fine in demos and slowly become useless in production.

The real job of agent memory management

The goal is not to make an agent remember a meeting like a human.

The goal is to make future work start with the right facts and none of the wrong ones.

That’s a much stricter bar.

And honestly, it’s the useful one.

Good agent memory is filtration.

Not hoarding.

Keep the archive.
Extract the durable facts.
Retrieve only what improves the next step.

If your agent forgets the meeting five minutes later, that’s annoying.

If it remembers the wrong parts for the next five weeks, that’s worse.

That’s the tradeoff most memory systems still get backward.

If you’re building agent workflows at any real volume, this is also where infra choices matter. Better memory design reduces waste. Flat-rate compute removes the constant token math while you iterate. That combination is a lot more practical than pretending every transcript deserves to live forever in active context.

Top comments (0)