DEV Community

Helen Mireille
Helen Mireille

Posted on • Originally published at slackclaw.ai

Your OpenClaw Slack Agent Forgets Everything Between Threads

Your OpenClaw Slack Agent Forgets Everything Between Threads

Someone on our team asked the agent "what did we decide about the billing migration?" at 3pm on a Tuesday. The agent had been part of that decision three hours earlier, in a different thread. It had no idea what they were talking about.

This is the memory problem, and it's the most common complaint I hear from teams running OpenClaw in Slack. The agent is smart within a conversation. Between conversations, it has amnesia.

LLMs are stateless. Every new thread starts from zero: system prompt, tool descriptions, and the current message. No memory of the 47 threads the agent participated in yesterday. No awareness that it helped debug a deploy issue this morning. No recollection that the team already decided to use Postgres instead of DynamoDB last week.

For a chatbot, that's fine. For an agent that's supposed to be a team member, it's a fundamental limitation.

Why This Matters More in Slack Than Anywhere Else

A Slack workspace is institutional memory in real time. Decisions get made in threads. Context lives in channel history. Tribal knowledge accumulates in conversations that nobody bookmarks or documents.

When your agent can't access any of that between threads, you get a pattern I've started calling "re-briefing tax." Every time someone asks the agent a question that requires context from a previous conversation, they have to re-explain the context. "We talked about this yesterday in #engineering" doesn't help the agent. It needs the actual content re-stated.

We measured this. About 30% of messages to our agent included some form of re-briefing: "remember, we're using the new API," "as I mentioned earlier," "following up on the deploy issue from this morning." That's 30% of input tokens spent on context the agent should already have.

At scale, it's expensive. More importantly, it's annoying. People stop treating the agent like a team member and start treating it like a tool that needs to be fed instructions every time. Which defeats the purpose.

The Three Kinds of Memory You Need

After six weeks of dealing with this, I think about agent memory in three categories. Most teams implement zero of them. Getting even one working changes the experience dramatically.

Thread memory. This is what you get out of the box. The agent remembers everything within a single Slack thread. Messages accumulate in the context window, the agent can reference earlier parts of the conversation. This works until the thread gets long enough to hit context limits, at which point the agent starts "forgetting" early messages as they fall out of the window. We covered context summarization in a previous article — that's the fix for thread memory specifically.

Session memory. This is what's missing by default. Session memory means the agent retains information across threads within some time window. "We discussed X in #engineering this morning" should be something the agent can recall without the user repeating it. This requires persisting conversation summaries or key facts somewhere the agent can retrieve them.

Institutional memory. The deep stuff. What did the team decide about architecture six months ago? What's our standard process for incident response? Who's the expert on the payments system? This lives in Slack history, Notion docs, GitHub wikis, and people's heads. Making it accessible to the agent is a RAG problem.

Building Session Memory (The Practical Way)

Here's what we built. It's not elegant but it works.

After every thread where the agent participates, it generates a summary: who was involved, what was discussed, what decisions were made, what action items were assigned. This summary gets stored in a simple SQLite database with a timestamp and channel reference.

When the agent receives a new message, before responding it queries the database for recent summaries from the same channel (last 24 hours) and any summaries mentioning the current user (last 48 hours). These get injected into the context as "recent context" after the system prompt.

The implementation is an MCP server with two tools: store_summary (called at the end of meaningful threads) and get_context (called at the start of new interactions). About 200 lines of Python.

The tricky part is deciding when to store a summary. You don't want one for every "thanks!" thread. We trigger storage when a thread exceeds 5 messages and contains at least one tool call or decision-like language ("let's go with," "we should," "the plan is"). This filters out casual conversations while capturing substantive ones.

The effect was immediate. The "re-briefing tax" dropped from 30% to about 8%. People started asking follow-up questions in new threads, trusting the agent would have context. When someone asked "what's the status of the billing migration?" the agent pulled the morning's discussion summary and gave an accurate, contextual answer.

The Cost of Memory

Memory isn't free. Every summary you inject is input tokens. Our recent context injection averages about 1,200 tokens per interaction — the last few conversation summaries compressed into key facts.

But here's the math: the re-briefing messages we eliminated averaged 400 tokens each, and they appeared in 30% of interactions. So we went from 120 tokens of re-briefing overhead per average interaction (400 * 0.30) to 1,200 tokens of memory context per interaction. That's a net increase of about 1,080 tokens per interaction.

On paper, memory costs more. In practice, the response quality improvement is worth it. The agent answers correctly on the first try instead of asking for clarification or giving a decontextualised response that the user then corrects with a follow-up message. The follow-up messages were often longer than the memory injection.

Net-net, our per-interaction cost went up slightly but our per-resolution cost went down by about 20%. Fewer back-and-forth messages, fewer "that's not what I meant" corrections.

Institutional Memory Is a RAG Problem

Session memory handles the last 24-48 hours. For anything older, you need retrieval.

The obvious approach: index your Slack history and let the agent search it. The less obvious reality: Slack history is noisy. Thousands of messages per day, most of them irrelevant. A naive RAG implementation over raw Slack messages returns garbage because the retrieval can't distinguish between a casual mention of "billing" and a detailed thread about the billing migration architecture.

What works better: index the conversation summaries, not the raw messages. If your session memory system is generating summaries for every substantive thread, those summaries become your RAG corpus. They're pre-filtered for relevance, pre-compressed for density, and pre-structured for retrieval.

We index summaries older than 48 hours into a vector store. When the agent needs historical context, it searches summaries rather than raw messages. The retrieval quality is dramatically better because each summary is a coherent unit of information rather than a random slice of conversation.

Slack's Real-Time Search API can also help here. Instead of maintaining your own index, you can query Slack's search at runtime for relevant historical messages. The downside is latency (adds 1-3 seconds per query) and the noisiness of raw message search. We use it as a fallback when summary search doesn't find what the agent needs.

What SlackClaw Gets Right

I'll be direct: memory management is the strongest argument for a managed platform over self-hosting.

SlackClaw builds session memory into the agent lifecycle. Conversation summaries are generated automatically, context is injected per-channel, and historical search uses a pre-built index of conversation summaries. You don't build the SQLite database, you don't write the MCP server, you don't tune the summary trigger logic.

The platform also handles something we haven't solved well in our self-hosted setup: cross-channel memory. When a topic discussed in #engineering is relevant to a question in #product, the managed platform can surface that context. Our SQLite approach only checks the current channel and user history; cross-channel awareness would require a more sophisticated retrieval layer that we haven't built yet.

The Minimum Viable Memory

If you take one thing from this: build the summary MCP server. Two tools, 200 lines of Python, SQLite backend. Store a summary after every substantive thread. Inject the last 24 hours of summaries into new interactions.

It won't give you perfect institutional memory. It won't make your agent omniscient about everything that's ever been discussed in your workspace. But it'll stop the amnesia problem that makes people give up on the agent after the first week.

Your agent should know what happened this morning. That's a low bar. Most agents don't clear it.


Helen Mireille is chief of staff at an early-stage tech startup. She writes about the gap between AI agent demos and what actually works in production.

Top comments (0)