DEV Community

Harish Kotra (he/him)
Harish Kotra (he/him)

Posted on

Building a Slack Bot That Actually Remembers: slacktag-oss

How I built an open-source Slack assistant with persistent semantic memory, powered by any LLM and Mem0's managed memory layer — no vector database required.


The problem with Slack bots and memory

Most AI Slack bots have the memory of a goldfish. Every conversation starts from scratch. You ask it about your sprint goals, it gives a great answer, then three days later you ask a follow-up and it has no idea what you're talking about. You end up re-explaining context constantly.

The commercial solution to this is Claude Tag — a Slack integration that maintains genuine conversational continuity. But it's tied to one provider and not open-source.

slacktag-oss is our attempt to replicate that experience: a Slack bot with real, semantic, persistent memory that works with any LLM — including ones running entirely on your laptop.


What I built

A Python Slack bot with:

  • Socket Mode for local dev (no public URL needed), HTTP-ready for prod
  • LangChain to abstract LLM calls across any OpenAI-compatible endpoint
  • Mem0 managed cloud for semantic memory — no Qdrant, no Pinecone, no infra to run
  • Three memory scopes: per-channel, per-thread, per-DM
  • Built-in !clear and !memory commands
  • A clean, extensible architecture you can fork and build on

Architecture

Before diving into code, here's the full request lifecycle:

┌─────────────────────────────────────────────────────────────┐
│                         Slack                               │
│  @mention in channel  ──┐                                   │
│  DM to bot            ──┼──► Slack Events API               │
│  Thread reply         ──┘         │                         │
└───────────────────────────────────│─────────────────────────┘
                                    │ (Socket Mode / HTTP)
                                    ▼
┌─────────────────────────────────────────────────────────────┐
│                      slack-bolt (Python)                     │
│   bot.py  ──►  router.py  ──►  handler.py                  │
│                                    │                        │
│                    ┌───────────────┤                        │
│                    │               │                        │
│                    ▼               ▼                        │
│              Mem0 Client      LangChain                     │
│              (managed)        ChatOpenAI                    │
└────────────────────────────────────────────────────────────-┘
                    │
                    ▼
        ┌───────────────────────┐
        │   Mem0 Managed Cloud  │
        │  Vector Embeddings    │
        │  Entity Extraction    │
        │  Deduplication        │
        └───────────────────────┘
Enter fullscreen mode Exit fullscreen mode

The key design decision: Mem0 is the only stateful dependency. There's no database to manage, no Redis, no Qdrant. The bot process itself is stateless — you can restart it freely without losing any memory.


Project structure

slacktag-oss/
├── main.py
├── config/settings.py       ← Pydantic settings from .env
├── core/
│   ├── bot.py               ← Slack Bolt app + event registration
│   ├── handler.py           ← All orchestration logic lives here
│   └── router.py            ← Dispatches channel mentions vs DMs
├── memory/
│   ├── base.py              ← Abstract interface
│   ├── channel_memory.py    ← Channel + thread scoped memory
│   ├── dm_memory.py         ← Per-user private memory
│   └── mem0_store.py        ← Mem0 client factory
├── llm/client.py            ← ChatOpenAI factory
└── tools/registry.py        ← Tool plugin stub (v2)
Enter fullscreen mode Exit fullscreen mode

The memory layer: why Mem0

The typical approach to bot memory is a rolling window: keep the last N messages in the prompt. This breaks down fast — context gets stale, important things fall out of the window, and token costs grow linearly.

Mem0 takes a different approach. When you store a conversation, it:

  1. Runs an extraction pass to pull out facts, entities, and preferences
  2. Deduplicates them against what's already stored
  3. Indexes them as vector embeddings for semantic retrieval

When you later ask a question, you get back the most relevant past memories — not just the most recent ones. A user's preference mentioned three weeks ago will surface when relevant, even if hundreds of messages happened in between.

Setting up the client

Because we're using Mem0's managed cloud, the entire backend is three lines:

# memory/mem0_store.py
from mem0 import MemoryClient
from config.settings import settings

def get_mem0_client() -> MemoryClient:
    return MemoryClient(api_key=settings.MEM0_API_KEY)
Enter fullscreen mode Exit fullscreen mode

No vector database config. No embedding model to choose. No collection names to manage.

Memory scoping

The key insight for a Slack bot is that different conversations need different memory boundaries:

# channel_memory.py
def scope_id(self, channel_id: str, thread_ts: str = None) -> str:
    if thread_ts:
        return f"thread:{channel_id}:{thread_ts}"   # isolated thread
    return f"channel:{channel_id}"                   # shared channel

# dm_memory.py
def scope_id(self, user_id: str) -> str:
    return f"dm:{user_id}"                           # private per user
Enter fullscreen mode Exit fullscreen mode

Mem0 uses this string as a user_id — anything stored under channel:C12345 is shared by everyone in that channel. Anything under dm:U67890 is private. Thread memory is completely isolated so a debugging session in a thread doesn't pollute the main channel's memory.

BaseMemory interface

Both ChannelMemory and DMMemory implement the same four-method interface:

# memory/base.py
class BaseMemory(ABC):
    @abstractmethod
    def add(self, messages: list[dict], scope_id: str) -> None: ...

    @abstractmethod
    def search(self, query: str, scope_id: str) -> list[dict]: ...

    @abstractmethod
    def get_all(self, scope_id: str) -> list[dict]: ...

    @abstractmethod
    def clear(self, scope_id: str) -> None: ...
Enter fullscreen mode Exit fullscreen mode

This makes it easy to swap backends later — implement BaseMemory, update the factory, done.


The LLM layer: LangChain + any OpenAI-compatible endpoint

# llm/client.py
from langchain_openai import ChatOpenAI
from config.settings import settings

def get_llm() -> ChatOpenAI:
    return ChatOpenAI(
        base_url=settings.LLM_BASE_URL,
        api_key=settings.LLM_API_KEY,
        model=settings.LLM_MODEL,
        temperature=0.7,
        streaming=True,
    )
Enter fullscreen mode Exit fullscreen mode

base_url is the only thing that changes between providers. Ollama, LM Studio, OpenAI, Groq, Together AI — all work without touching any other code.


The handler: where memory meets LLM

handler.py is the heart of the bot. For every request, it:

  1. Checks for built-in commands
  2. Searches Mem0 for semantically relevant past context
  3. Gets recent history for conversational continuity
  4. Builds the LangChain message list
  5. Invokes the LLM
  6. Stores the exchange back in Mem0
# core/handler.py (simplified)
def handle_channel_mention(channel_id, user_id, text, thread_ts=None):
    scope = channel_memory.scope_id(channel_id, thread_ts)

    # Built-in commands short-circuit before hitting the LLM
    if text.strip() == "!clear":
        channel_memory.clear(scope)
        return "Memory cleared."
    if text.strip() == "!memory":
        return format_memories(channel_memory.get_all(scope))

    # Dual retrieval: semantic + recency
    relevant = channel_memory.search(text, scope)
    history  = channel_memory.get_all(scope)

    messages = build_messages(system_prompt, relevant, history, text)
    response = llm.invoke(messages)
    reply    = response.content

    # Store the exchange — Mem0 extracts entities + deduplicates
    channel_memory.add(
        [{"role": "user", "content": text},
         {"role": "assistant", "content": reply}],
        scope,
    )
    return reply
Enter fullscreen mode Exit fullscreen mode

Building the prompt

The message list passed to the LLM is assembled in a specific order:

def build_messages(system_prompt, relevant_memories, recent_history, user_input):
    messages = [SystemMessage(content=system_prompt)]

    # Inject relevant memories as a second system message
    if relevant_memories:
        memory_context = "\n".join(
            m["memory"] for m in relevant_memories if "memory" in m
        )
        messages.append(SystemMessage(
            content=f"Relevant context from earlier:\n{memory_context}"
        ))

    # Append recent history
    for entry in recent_history[-MAX_HISTORY_MESSAGES:]:
        if entry.get("role") == "user":
            messages.append(HumanMessage(content=entry["content"]))
        elif entry.get("role") == "assistant":
            messages.append(AIMessage(content=entry["content"]))

    # Current user message always last
    messages.append(HumanMessage(content=user_input))
    return messages
Enter fullscreen mode Exit fullscreen mode

The two-system-message pattern keeps the bot's persona and instructions separate from the injected memory context — cleaner for the model to reason about.


Wiring up Slack

slack-bolt makes event handling clean:

# core/bot.py
app = App(token=settings.SLACK_BOT_TOKEN, signing_secret=settings.SLACK_SIGNING_SECRET)

@app.event("app_mention")
def on_mention(event, say):
    route_mention(event, say)   # channel / thread flow

@app.event("message")
def on_message(event, say):
    if event.get("channel_type") == "im" and not event.get("bot_id"):
        route_dm(event, say)    # DM flow, ignore bot's own messages
Enter fullscreen mode Exit fullscreen mode

router.py extracts the relevant fields and calls the appropriate handler:

# core/router.py
def route_mention(event, say):
    channel_id = event.get("channel")
    thread_ts  = event.get("thread_ts")
    text       = event.get("text", "")

    reply = handle_channel_mention(channel_id, event["user"], text, thread_ts)
    say(text=reply, thread_ts=thread_ts or event["ts"])
Enter fullscreen mode Exit fullscreen mode

Replies always go back to the same thread — if the mention was in a thread, the bot stays in that thread.


Configuration with Pydantic Settings

All config lives in one place with validation:

# config/settings.py
class Settings(BaseSettings):
    SLACK_BOT_TOKEN: str
    SLACK_APP_TOKEN: str
    SLACK_SIGNING_SECRET: str
    LLM_BASE_URL: str = "http://localhost:11434/v1"
    LLM_API_KEY: str = "ollama"
    LLM_MODEL: str = "llama3.2"
    MEM0_API_KEY: str
    BOT_NAME: str = "Claude"
    MAX_HISTORY_MESSAGES: int = 20
    SYSTEM_PROMPT: str = ""

    class Config:
        env_file = ".env"
Enter fullscreen mode Exit fullscreen mode

Missing required fields (the Slack tokens, the Mem0 key) raise a ValidationError at startup — fail fast before any event processing begins.


Running it locally

# Get dependencies
pip install -r requirements.txt

# Start the bot (Socket Mode — no public URL needed)
python main.py
Enter fullscreen mode Exit fullscreen mode

That's it. No Docker, no Qdrant, no ngrok. Invite the bot to a channel, @mention it, and it starts building memory from the first message.


What "semantic memory" actually looks like in practice

Here's a realistic example. Day 1:

User: @slacktag Our API rate limit is 100 req/min per tenant. Keep that in mind for capacity planning.
Bot: Got it. I'll factor that in for any capacity discussions.

Day 3 (hundreds of messages later in the channel):

User: @slacktag We're about to onboard 5 new enterprise tenants. Any concerns?
Bot: A few things to consider: with your current API rate limit of 100 req/min per tenant, 5 new enterprise tenants could significantly increase peak load. You may want to review your rate limiting strategy before onboarding...

Mem0 surfaced the rate limit fact from Day 1 because it was semantically relevant to the capacity question — even though it was nowhere in the recent message window.


Deployment path

For production, swap SocketModeHandler for a standard HTTP adapter:

# Using Flask
from slack_bolt.adapter.flask import SlackRequestHandler
from flask import Flask, request

flask_app = Flask(__name__)
handler = SlackRequestHandler(app)

@flask_app.route("/slack/events", methods=["POST"])
def events():
    return handler.handle(request)
Enter fullscreen mode Exit fullscreen mode

Point your Slack app's Request URL to https://your-domain/slack/events, deploy anywhere (Fly.io, Railway, Cloud Run — all work), and you're done. No state in the server — Mem0 holds everything.


What's next (v2 ideas)

A few extensions that would make this significantly more powerful:

Pluggable toolstools/registry.py is stubbed out for LangChain tool integration. Adding web search (Tavily, Brave Search) or a code execution sandbox would turn this into a capable agent.

Mem0 graph memory — Mem0 supports a graph mode that tracks relationships between entities across conversations. You could map out who's on which team, what projects are in flight, and surface that context automatically.

Per-channel LLM config — let admins set a different model per channel (e.g., a powerful model for #architecture, a fast cheap model for #random).

Reaction triggers — react with 🧠 to explicitly add a message to memory; react with 🗑️ to remove a fact. Much more controllable than pure auto-extraction.

!summarize — call mem0.get_all() and ask the LLM to produce a readable summary of everything it knows about this channel.


Getting involved

The codebase is intentionally small. handler.py is ~100 lines. Every module does one thing. If you want to contribute:

git clone https://github.com/harishkotra/slacktag-oss
cd slacktag-oss
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
Enter fullscreen mode Exit fullscreen mode

Pick any feature from the table in the README, implement it, and open a PR. The architecture is designed to stay simple — add without entangling.


Links

Top comments (0)