How I built an open-source Slack assistant with persistent semantic memory, powered by any LLM and Mem0's managed memory layer — no vector database required.
The problem with Slack bots and memory
Most AI Slack bots have the memory of a goldfish. Every conversation starts from scratch. You ask it about your sprint goals, it gives a great answer, then three days later you ask a follow-up and it has no idea what you're talking about. You end up re-explaining context constantly.
The commercial solution to this is Claude Tag — a Slack integration that maintains genuine conversational continuity. But it's tied to one provider and not open-source.
slacktag-oss is our attempt to replicate that experience: a Slack bot with real, semantic, persistent memory that works with any LLM — including ones running entirely on your laptop.
What I built
A Python Slack bot with:
- Socket Mode for local dev (no public URL needed), HTTP-ready for prod
- LangChain to abstract LLM calls across any OpenAI-compatible endpoint
- Mem0 managed cloud for semantic memory — no Qdrant, no Pinecone, no infra to run
- Three memory scopes: per-channel, per-thread, per-DM
- Built-in
!clearand!memorycommands - A clean, extensible architecture you can fork and build on
Architecture
Before diving into code, here's the full request lifecycle:
┌─────────────────────────────────────────────────────────────┐
│ Slack │
│ @mention in channel ──┐ │
│ DM to bot ──┼──► Slack Events API │
│ Thread reply ──┘ │ │
└───────────────────────────────────│─────────────────────────┘
│ (Socket Mode / HTTP)
▼
┌─────────────────────────────────────────────────────────────┐
│ slack-bolt (Python) │
│ bot.py ──► router.py ──► handler.py │
│ │ │
│ ┌───────────────┤ │
│ │ │ │
│ ▼ ▼ │
│ Mem0 Client LangChain │
│ (managed) ChatOpenAI │
└────────────────────────────────────────────────────────────-┘
│
▼
┌───────────────────────┐
│ Mem0 Managed Cloud │
│ Vector Embeddings │
│ Entity Extraction │
│ Deduplication │
└───────────────────────┘
The key design decision: Mem0 is the only stateful dependency. There's no database to manage, no Redis, no Qdrant. The bot process itself is stateless — you can restart it freely without losing any memory.
Project structure
slacktag-oss/
├── main.py
├── config/settings.py ← Pydantic settings from .env
├── core/
│ ├── bot.py ← Slack Bolt app + event registration
│ ├── handler.py ← All orchestration logic lives here
│ └── router.py ← Dispatches channel mentions vs DMs
├── memory/
│ ├── base.py ← Abstract interface
│ ├── channel_memory.py ← Channel + thread scoped memory
│ ├── dm_memory.py ← Per-user private memory
│ └── mem0_store.py ← Mem0 client factory
├── llm/client.py ← ChatOpenAI factory
└── tools/registry.py ← Tool plugin stub (v2)
The memory layer: why Mem0
The typical approach to bot memory is a rolling window: keep the last N messages in the prompt. This breaks down fast — context gets stale, important things fall out of the window, and token costs grow linearly.
Mem0 takes a different approach. When you store a conversation, it:
- Runs an extraction pass to pull out facts, entities, and preferences
- Deduplicates them against what's already stored
- Indexes them as vector embeddings for semantic retrieval
When you later ask a question, you get back the most relevant past memories — not just the most recent ones. A user's preference mentioned three weeks ago will surface when relevant, even if hundreds of messages happened in between.
Setting up the client
Because we're using Mem0's managed cloud, the entire backend is three lines:
# memory/mem0_store.py
from mem0 import MemoryClient
from config.settings import settings
def get_mem0_client() -> MemoryClient:
return MemoryClient(api_key=settings.MEM0_API_KEY)
No vector database config. No embedding model to choose. No collection names to manage.
Memory scoping
The key insight for a Slack bot is that different conversations need different memory boundaries:
# channel_memory.py
def scope_id(self, channel_id: str, thread_ts: str = None) -> str:
if thread_ts:
return f"thread:{channel_id}:{thread_ts}" # isolated thread
return f"channel:{channel_id}" # shared channel
# dm_memory.py
def scope_id(self, user_id: str) -> str:
return f"dm:{user_id}" # private per user
Mem0 uses this string as a user_id — anything stored under channel:C12345 is shared by everyone in that channel. Anything under dm:U67890 is private. Thread memory is completely isolated so a debugging session in a thread doesn't pollute the main channel's memory.
BaseMemory interface
Both ChannelMemory and DMMemory implement the same four-method interface:
# memory/base.py
class BaseMemory(ABC):
@abstractmethod
def add(self, messages: list[dict], scope_id: str) -> None: ...
@abstractmethod
def search(self, query: str, scope_id: str) -> list[dict]: ...
@abstractmethod
def get_all(self, scope_id: str) -> list[dict]: ...
@abstractmethod
def clear(self, scope_id: str) -> None: ...
This makes it easy to swap backends later — implement BaseMemory, update the factory, done.
The LLM layer: LangChain + any OpenAI-compatible endpoint
# llm/client.py
from langchain_openai import ChatOpenAI
from config.settings import settings
def get_llm() -> ChatOpenAI:
return ChatOpenAI(
base_url=settings.LLM_BASE_URL,
api_key=settings.LLM_API_KEY,
model=settings.LLM_MODEL,
temperature=0.7,
streaming=True,
)
base_url is the only thing that changes between providers. Ollama, LM Studio, OpenAI, Groq, Together AI — all work without touching any other code.
The handler: where memory meets LLM
handler.py is the heart of the bot. For every request, it:
- Checks for built-in commands
- Searches Mem0 for semantically relevant past context
- Gets recent history for conversational continuity
- Builds the LangChain message list
- Invokes the LLM
- Stores the exchange back in Mem0
# core/handler.py (simplified)
def handle_channel_mention(channel_id, user_id, text, thread_ts=None):
scope = channel_memory.scope_id(channel_id, thread_ts)
# Built-in commands short-circuit before hitting the LLM
if text.strip() == "!clear":
channel_memory.clear(scope)
return "Memory cleared."
if text.strip() == "!memory":
return format_memories(channel_memory.get_all(scope))
# Dual retrieval: semantic + recency
relevant = channel_memory.search(text, scope)
history = channel_memory.get_all(scope)
messages = build_messages(system_prompt, relevant, history, text)
response = llm.invoke(messages)
reply = response.content
# Store the exchange — Mem0 extracts entities + deduplicates
channel_memory.add(
[{"role": "user", "content": text},
{"role": "assistant", "content": reply}],
scope,
)
return reply
Building the prompt
The message list passed to the LLM is assembled in a specific order:
def build_messages(system_prompt, relevant_memories, recent_history, user_input):
messages = [SystemMessage(content=system_prompt)]
# Inject relevant memories as a second system message
if relevant_memories:
memory_context = "\n".join(
m["memory"] for m in relevant_memories if "memory" in m
)
messages.append(SystemMessage(
content=f"Relevant context from earlier:\n{memory_context}"
))
# Append recent history
for entry in recent_history[-MAX_HISTORY_MESSAGES:]:
if entry.get("role") == "user":
messages.append(HumanMessage(content=entry["content"]))
elif entry.get("role") == "assistant":
messages.append(AIMessage(content=entry["content"]))
# Current user message always last
messages.append(HumanMessage(content=user_input))
return messages
The two-system-message pattern keeps the bot's persona and instructions separate from the injected memory context — cleaner for the model to reason about.
Wiring up Slack
slack-bolt makes event handling clean:
# core/bot.py
app = App(token=settings.SLACK_BOT_TOKEN, signing_secret=settings.SLACK_SIGNING_SECRET)
@app.event("app_mention")
def on_mention(event, say):
route_mention(event, say) # channel / thread flow
@app.event("message")
def on_message(event, say):
if event.get("channel_type") == "im" and not event.get("bot_id"):
route_dm(event, say) # DM flow, ignore bot's own messages
router.py extracts the relevant fields and calls the appropriate handler:
# core/router.py
def route_mention(event, say):
channel_id = event.get("channel")
thread_ts = event.get("thread_ts")
text = event.get("text", "")
reply = handle_channel_mention(channel_id, event["user"], text, thread_ts)
say(text=reply, thread_ts=thread_ts or event["ts"])
Replies always go back to the same thread — if the mention was in a thread, the bot stays in that thread.
Configuration with Pydantic Settings
All config lives in one place with validation:
# config/settings.py
class Settings(BaseSettings):
SLACK_BOT_TOKEN: str
SLACK_APP_TOKEN: str
SLACK_SIGNING_SECRET: str
LLM_BASE_URL: str = "http://localhost:11434/v1"
LLM_API_KEY: str = "ollama"
LLM_MODEL: str = "llama3.2"
MEM0_API_KEY: str
BOT_NAME: str = "Claude"
MAX_HISTORY_MESSAGES: int = 20
SYSTEM_PROMPT: str = ""
class Config:
env_file = ".env"
Missing required fields (the Slack tokens, the Mem0 key) raise a ValidationError at startup — fail fast before any event processing begins.
Running it locally
# Get dependencies
pip install -r requirements.txt
# Start the bot (Socket Mode — no public URL needed)
python main.py
That's it. No Docker, no Qdrant, no ngrok. Invite the bot to a channel, @mention it, and it starts building memory from the first message.
What "semantic memory" actually looks like in practice
Here's a realistic example. Day 1:
User: @slacktag Our API rate limit is 100 req/min per tenant. Keep that in mind for capacity planning.
Bot: Got it. I'll factor that in for any capacity discussions.
Day 3 (hundreds of messages later in the channel):
User: @slacktag We're about to onboard 5 new enterprise tenants. Any concerns?
Bot: A few things to consider: with your current API rate limit of 100 req/min per tenant, 5 new enterprise tenants could significantly increase peak load. You may want to review your rate limiting strategy before onboarding...
Mem0 surfaced the rate limit fact from Day 1 because it was semantically relevant to the capacity question — even though it was nowhere in the recent message window.
Deployment path
For production, swap SocketModeHandler for a standard HTTP adapter:
# Using Flask
from slack_bolt.adapter.flask import SlackRequestHandler
from flask import Flask, request
flask_app = Flask(__name__)
handler = SlackRequestHandler(app)
@flask_app.route("/slack/events", methods=["POST"])
def events():
return handler.handle(request)
Point your Slack app's Request URL to https://your-domain/slack/events, deploy anywhere (Fly.io, Railway, Cloud Run — all work), and you're done. No state in the server — Mem0 holds everything.
What's next (v2 ideas)
A few extensions that would make this significantly more powerful:
Pluggable tools — tools/registry.py is stubbed out for LangChain tool integration. Adding web search (Tavily, Brave Search) or a code execution sandbox would turn this into a capable agent.
Mem0 graph memory — Mem0 supports a graph mode that tracks relationships between entities across conversations. You could map out who's on which team, what projects are in flight, and surface that context automatically.
Per-channel LLM config — let admins set a different model per channel (e.g., a powerful model for #architecture, a fast cheap model for #random).
Reaction triggers — react with 🧠 to explicitly add a message to memory; react with 🗑️ to remove a fact. Much more controllable than pure auto-extraction.
!summarize — call mem0.get_all() and ask the LLM to produce a readable summary of everything it knows about this channel.
Getting involved
The codebase is intentionally small. handler.py is ~100 lines. Every module does one thing. If you want to contribute:
git clone https://github.com/harishkotra/slacktag-oss
cd slacktag-oss
python -m venv .venv && source .venv/bin/activate
pip install -r requirements.txt
cp .env.example .env
Pick any feature from the table in the README, implement it, and open a PR. The architecture is designed to stay simple — add without entangling.
Links
- GitHub: github.com/harishkotra/slacktag-oss
- Mem0 docs: docs.mem0.ai
- Mem0 free tier: app.mem0.ai
- slack-bolt Python: slack.dev/bolt-python
- LangChain: python.langchain.com
Top comments (0)