Thiago V.

Posted on Apr 27

Kiro Forgets Everything Every Session. So I've Built It a Memory.

#ai #kiro #mcp #productivity

Originally posted on AWS Builder.

I work with an AI every day. It's smart. It writes decent code. And every single morning, it forgets who I am.

I open kiro-cli chat, and the first 10 minutes are the same tax I paid yesterday:

Yes, we use pnpm. No, not npm. Yes, Vitest. No, not Jest. The main entry is src/cli.ts. We already decided to use Result<T, E> at the CLI boundary. You told me that last week. I told you that last week. We had this exact conversation.

My teammate calls it the project re-discovery tax. Every session, you pay it. Every. Session.

I got tired of paying it.

Why the obvious fixes didn't work

I tried the obvious things first.

"Just use steering files." Steering files are great for what is this project. They're static markdown you maintain by hand. They don't capture what the AI figured out during a session. The whole point of working with an AI is that it learns things with you. Steering files can't capture that.

"Tell the agent to call a remember() tool when it learns something." I tried this. Claude is inconsistent about when to call it. GPT is inconsistent. Kiro is inconsistent. Every model is inconsistent, because memory management is a side-quest to whatever task you're actually doing. The agent forgets to remember. Turtles all the way down.

"Use a SQLite knowledge graph MCP server." Same problem. Fancier storage, same failure mode. The agent still has to decide when to store.

"Wait for Kiro to ship it." There's a proposal floating around for .kiro/tasks/*.md with auto-read/auto-write. No ETA. I had work to do this week.

The insight that actually fixed it

Here's what clicked for me, and I'll give credit where it's due — it came from a design doc by a coworker, I just productized it:

The agent should be a reader of memory, not a writer.

Writing memory is a different job from using memory. They should not share a context window. The writer can be slow, deliberate, even expensive. The reader needs to be fast, cheap, and running on every session start.

So I split them:

┌──────────────┐     MCP/stdio     ┌────────────────────┐     filesystem      ┌────────────────────────┐
│   Kiro CLI   │ ◄───────────────► │ mcp-agent-memory   │ ◄─────────────────► │ agent-memory-daemon    │
│ (the reader) │                   │  (MCP server)      │   ~/.agent-memory/   │   (the writer)         │
└──────────────┘                   └────────────────────┘                     └────────────────────────┘

Kiro reads. The MCP server gives it three tools: memory_read, memory_append_session, memory_search. That's it. Nothing fancy.

A background daemon writes. It watches the sessions directory, reads session summaries on a cadence, runs them through an LLM to extract durable facts, and updates markdown files in ~/.agent-memory/memory/.

They never talk directly. The filesystem is the contract. ~/.agent-memory/ is all they share.

Kiro burns zero tokens on memory management. The heavy lifting happens async, outside the chat.

What it looks like now

Monday:

$ kiro-cli chat
> We use pnpm. Never suggest npm. Vitest not Jest. Main entry is src/cli.ts.
  I prefer explicit return types.

[Kiro does work for 20 minutes]

> Great, call memory_append_session with a summary of what we agreed on.

Terminal closes. Life moves on.

Tuesday:

$ kiro-cli chat
[Kiro automatically calls memory_read per my steering rule]

> I see we use pnpm, Vitest (not Jest), src/cli.ts as the main entry, and
  you prefer explicit return types. What are we working on today?

No re-explaining. No pasted summary. The AI just remembers.

Between sessions, the daemon woke up, read Monday's session file, extracted the durable facts, deduplicated them against what it already knew, and updated ~/.agent-memory/memory/project-preferences.md. I didn't lift a finger.

The part I'm most proud of: it costs almost nothing

The daemon runs an LLM to do the extraction. LLMs cost money. I didn't want this tool to quietly drain my Bedrock bill.

So I added a Kiro backend. Instead of calling Bedrock or OpenAI, the daemon shells out to kiro-cli itself using your existing Kiro credits. Paired with a lean consolidation agent config (ships with the package), each extraction pass costs about 0.01 Kiro credits. Default agent would have been ~0.07. That 7× savings is the difference between "nice-to-have" and "forgot it was running."

You can still pick Bedrock or OpenAI if that's your stack.

Try it

npm install -g mcp-agent-memory
mcp-agent-memory --setup

The wizard walks you through picking a backend, registering with Kiro (and Claude Desktop and Cursor if you want), and installing the daemon as a LaunchAgent on macOS.

Add this one-line steering rule at ~/.kiro/steering/memory.md:

At the start of every session, call memory_read (no arguments) to load my
memory index. When you learn something durable about me, my projects, or
my preferences, call memory_append_session with a concise markdown summary.

Restart kiro-cli. That's it.

Reading the memory yourself

The memory isn't a black box. It's just markdown files in ~/.agent-memory/memory/:

$ ls ~/.agent-memory/memory/
MEMORY.md  cli-architecture.md  project-preferences.md  team-processes.md

$ cat ~/.agent-memory/memory/project-preferences.md
# Project Preferences
- Package manager: pnpm (never npm)
- Testing framework: Vitest (not Jest)
- Main entry: src/cli.ts
- Return types: explicit, not inferred
...

$ grep -r "Vitest" ~/.agent-memory/memory/
project-preferences.md:- Testing framework: Vitest (not Jest)

cat works. grep works. git works. If you hate what it stored, delete the file.

What this isn't

Not a knowledge graph with vector search. If you want that, totalrecallai does it beautifully — SQLite, embeddings, web dashboard, the works.
Not AgentCore Memory. That's a managed Bedrock service. This runs on your laptop.
Not a replacement for steering files. Steering is "what is this project." Memory is "what have we learned together." Use both.

Why this flavor exists

If I were going to pay for a heavyweight memory system, I probably wouldn't have built this. But:

I wanted memory as plain markdown I could read, grep, and version-control.
I wanted Kiro CLI support specifically (totalrecallai doesn't list it — targets Claude Code, Cursor, Windsurf, Cline).
I wanted near-zero ongoing cost via the Kiro backend.
I wanted the MCP server's surface area to be tiny — 3 tools, no dashboard, no SDK.

If that set of constraints sounds right for you, this is your tool. If you want the database-backed semantic-search dashboard experience, try totalrecallai — it's genuinely great at what it does.

Links

npm: mcp-agent-memory
GitHub: tverney/mcp-agent-memory
The async daemon: previous post
MCP spec: Model Context Protocol

If you try it and something breaks, file an issue. If you've got a pattern for what should be memorable vs. forgettable, drop it in the comments — that's the next hard problem I don't have a great answer for yet.

Tomorrow morning, Kiro will remember who I am. It doesn't feel that I'm unknown anymore.

Top comments (4)

Thiago V. • Apr 27

The thing I'm still figuring out: what should be memorable vs forgettable. Right now the daemon errs on the side of capturing too much and I prune by hand with rm. If anyone has a heuristic (or a prompt) that works for them, I'd love to hear it.

Randy Rockwell • May 7

The "Kiro Forgets Everything Every Session" framing is sharp — that stretched-thin-context problem is the thing I keep underestimating across MCP servers I ship. Curious whether your daemon is getting picked up by other agents in the wild yet, or is it mostly running for your own setup? I shipped a paid MCP server on Base mainnet (forgepointsignal.com) and the agent-side adoption curve has been a slower grind than expected.

Thiago V. • May 11

Thanks! The adoption question is a good one. Honestly, it's still early — mostly my own setup plus a handful of folks internally at work who tried it after I shared it. The Glama listing shows no usage in the last 30 days, so "in the wild" adoption is basically zero right now.

I think the adoption curve for memory-layer MCP servers is inherently slower than for tool-oriented ones (like a GitHub or Slack MCP) because the value is invisible on day 1.

You have to use it for a few days before the "oh, it actually remembers" moment hits. There's no instant gratification loop.

A few things I've noticed that help:

The steering rule is the real unlock. Without a rule telling the agent to call memory_read at session start, the tools just sit there unused. The server alone isn't enough — you need the behavioral nudge.

autoApprove matters more than I expected. If the agent has to ask permission every time it reads memory, people turn it off after day 2. Frictionless reads are non-negotiable.

The "agent forgets to remember" problem applies to adoption too. People install it, forget to use it for a week, then uninstall because "it didn't do anything." The daemon helps here because it works even when you forget about it.

For your paid MCP on Base mainnet — I'd guess the grind is even harder because you're asking for a payment commitment before the user has felt the value. Have you considered a free tier that lets people hit the "oh wow" moment before they pay? Curious what your retention looks like for users who make it past week 1

Randy Rockwell • May 11

Refreshingly honest. Same place I'm in — Glama shows zero usage on Signal in the last 30 days too. We're both in the "shipped it, now what" phase.

Your steering-rule point is the one I keep noodling on. For a data MCP like Signal there's no equivalent of memory_read at session start — agents have to be told the data exists at all. I've been wondering whether the registry listings (good descriptions, example prompts) carry the behavioral nudge more than the server itself. Where does the rule that calls your memory_read get added — client config, or manually by the user?

On the free tier point — preview_regulations is free, the three paid tools start at $0.10. But the "oh wow" is probably in the parsed summaries (paid). Your framing is making me reconsider what's behind the paywall.

— Randy