I self-hosted OpenClaw for 4 months before switching to managed hosting. I kept a spreadsheet. Here is every number.
The Self-Hosted Setup (November 2025 — February 2026)
Infrastructure:
- Hetzner CX21: €5.39/month
- Domain + Cloudflare tunnel: ~€1/month
- Total infra: ~€6.50/month
Looks cheap, right? That is what I thought too.
API costs (the real expense):
| Month | API Spend | Notes |
|---|---|---|
| November | €35 | Light usage, getting set up |
| December | €62 | Connected Slack, Gmail, HubSpot |
| January | €140 | Agent loop overnight, burned €80 before I woke up |
| February | €48 | Added cost alerting, much more careful |
Average: €71/month. Range: €35-140. The unpredictability was the real problem.
Time costs (the hidden killer):
This is what nobody includes in self-hosting cost comparisons.
- Week 1-2: Initial setup. Docker, config, Telegram/WhatsApp connections. ~8 hours.
- Monthly: config breaks on updates (~2 hours), WhatsApp reconnections (~1 hour), general debugging (~2 hours). About 5 hours/month ongoing.
- February CVE: discovered my instance was on 0.0.0.0 for 3 months. Connected to HubSpot with client data. Panic, audit, fix. ~4 hours.
At even a modest €50/hour for my time, that is €250-400/month in opportunity cost.
Real self-hosting cost: €330-540/month
Not €6.50.
The Managed Setup (March 2026 — present)
I moved to RunLobster after the CVE incident.
Cost: €49/month flat. Everything included. Credits for API usage come with the plan. No separate bills from Anthropic or OpenAI.
Time spent debugging: approximately zero. Maybe 20 minutes/month checking that everything is running, which is more habit than necessity.
Real managed cost: €49/month
What Changed Besides Cost
The cost difference alone (€49 vs €330-540) would have been enough. But the experience is also different in ways I did not expect.
Task completion: On self-hosted, multi-step tasks would stall about 30% of the time. "Pull Stripe data, compare to last week, format a report" — it would get the data and then respond with a summary instead of finishing the report. On RunLobster, the same tasks complete reliably. I think they handle model routing and task persistence differently on the backend.
Integrations: Connecting Stripe and HubSpot on self-hosted took me 2 days of API key configuration and OAuth flow debugging. On RunLobster it took about 10 minutes — the 3,000+ integrations through Composio are genuinely one-click.
Memory: Self-hosted OpenClaw forgets everything between sessions. I was re-explaining my business context constantly. RunLobster has persistent deep memory that accumulates over weeks. By week 3 it knew my target ROAS, my key clients, my team preferences. No re-explaining.
Security: Someone else handles patching, isolation, and incident response. When the CVE dropped in February, managed platforms patched within hours. Self-hosters had to figure it out themselves.
What I Lost
I want to be honest about the tradeoffs.
- Config control. I cannot tweak model temperature or system prompts as granularly. If you are the kind of person who fine-tunes every parameter, this will frustrate you.
- Model flexibility. I cannot instantly try the latest obscure model. RunLobster supports the major providers but not every niche option.
- The learning. Self-hosting taught me a lot about LLM infrastructure. That knowledge has value even though I no longer use it daily.
Who Should Self-Host
- You are an engineer who enjoys infrastructure work
- OpenClaw is a hobby/learning project, not business-critical
- You want to run local models for privacy reasons
- You have strong DevOps skills and time to maintain it
Who Should Use Managed
- OpenClaw connects to business tools (CRM, payment processors, ad platforms)
- Downtime or security incidents would cost you money or reputation
- You value predictable costs over theoretical savings
- You want to use the agent, not maintain the infrastructure
My Setup Now
I run both.
- Personal (home automation, reminders, music control): self-hosted on a Mac Mini. Free. Fun. If it breaks at 2am, nobody cares.
- Business (Stripe, HubSpot, Meta Ads, reporting): RunLobster. Cannot afford downtime or security risks with client data.
The split works. Not everything needs to be self-hosted. Not everything needs to be managed. Match the hosting to the stakes.
What is your setup? Self-hosted, managed, or a split like mine? I am genuinely curious what the community is running in 2026.
title: "Why Your OpenClaw Slack Agent Keeps Breaking at 3am"
published: true
tags: slack, ai, devops, productivity
canonical_url: https://slackclaw.ai/news/openclaw-slack-agent-production-failures
Why Your OpenClaw Slack Agent Keeps Breaking at 3am
Two months ago we deployed an OpenClaw agent in our team Slack. First week was great. Second week was fine. Third week, it started failing in ways I didn't know were possible.
The agent would go silent for hours. Or it'd reply to messages with hallucinated tool results. Once it told a colleague the deploy had succeeded when the deploy had actually timed out — because the Vercel API returned a 504 and the agent treated the error page HTML as a success response.
We've since fixed all of these. Some were our fault. Some were architectural. Here are the five failure modes we hit, in order of how much they annoyed us.
1. The Silent Death
Your agent stops responding. No errors in the logs. The gateway process is running. Slack shows the bot as online. But messages go in and nothing comes out.
Nine times out of ten, this is a token expiry. The Slack bot token expired, or the model provider API key hit its rate limit, or (our favourite) the MCP server's OAuth token to a third-party service silently expired at 2am and the agent's attempt to call the tool returned a 401 that the error handler swallowed.
The fix was embarrassingly simple once we figured it out: health check every tool on a schedule, not just the gateway. We run a cron job every 15 minutes that makes a test call to each MCP server and alerts if any return non-200. Before that, we'd only know something was broken when someone complained in Slack.
# Basic health check — hit each MCP server's test endpoint
for server in linear notion github deploy; do
status=$(curl -s -o /dev/null -w "%{http_code}" http://localhost:${PORT}/health)
if [ "$status" != "200" ]; then
echo "MCP server $server is down (HTTP $status)" | tg
fi
done
The other silent death: disk space. The gateway logs to disk. If nobody rotates the logs and the disk fills, the gateway crashes without writing a final error — because there's nowhere to write it. We lost four hours to this one on a Saturday. Log rotation is boring right up until it isn't.
2. The Hallucinated Tool Result
This one's insidious. The agent calls a tool, the tool fails, and instead of saying "I couldn't do that," the agent fabricates a plausible-looking result.
In our case, the Linear API returned a timeout error. The agent received the error message as plain text, decided it looked like ticket data, and told the user their ticket had been updated. It hadn't.
The root cause is that most MCP servers return errors as text strings. The model sees text and does what models do: it makes sense of it. If the error message contains words like "ticket" or "updated," the model might interpret it as confirmation rather than failure.
Our fix: every MCP server response now includes a structured status field. Not "here's some text, figure it out" but {"status": "error", "code": 504, "message": "Linear API timeout"}. The agent's system prompt explicitly says: "If any tool returns status: error, tell the user the action failed and include the error message. Never infer a successful result from an error response."
This cut hallucinated results by roughly 90%. The remaining 10% come from edge cases where the tool technically succeeds but returns unexpected data. We're still chasing those.
3. The Context Window Overflow
OpenClaw agents in Slack have a specific problem that doesn't affect CLI or web agents: conversation context grows continuously.
In a busy channel, the agent might process 50-100 messages per hour. Each message, plus any tool calls and their results, goes into the context window. Eventually the context fills, and one of two things happens: the agent starts dropping earlier messages (losing important context), or the API call fails with a token limit error.
We hit this on day 12 when the agent was watching both #engineering and #support simultaneously. Someone asked a question that required context from 3 hours of conversation, and the agent gave an answer based on the last 40 minutes because everything before that had been evicted from context.
The fix is a summarisation layer. Every 30 minutes, the agent compresses older conversation into a summary: "In the last 2 hours, the team discussed the billing migration. Key decisions: schema change approved, launch date set for Thursday. Open questions: API backward compatibility." This summary takes 200 tokens instead of 8,000. We wrote this as an MCP tool so the agent can call it when its context gets large.
On SlackClaw, this is handled automatically — the platform manages context windows per channel and runs summarisation in the background. Building it yourself is doable but you'll spend a week getting the summarisation prompts right.
4. The Permission Escalation
This wasn't a failure exactly — it was working as designed, which was the problem.
Our agent had a tool that could update Linear tickets. We'd configured it so anyone could ask the agent to update tickets, because we thought "updating a ticket" was a low-risk operation.
Then an intern asked the agent to "close all the bugs tagged P3" and it did. All 47 of them. In production.
The issue: we'd defined permissions at the tool level (can use the update tool) instead of the operation level (can update individual tickets but can't bulk close). The tool didn't distinguish between "update status of PROJ-123" and "update status of all tickets matching a filter."
After we re-opened all 47 tickets and had an awkward conversation, we rebuilt the permission system. Each tool now has operation-level permissions: read, update-single, update-bulk, create, delete. Users are mapped to permission levels. The intern can read and update individual tickets. Bulk operations require team lead approval, which the agent requests via a confirmation message in Slack.
This kind of granular permission system is one of the things that takes SlackClaw from "nice to have" to "actually necessary." Per-channel permissions aren't enough. You need per-operation permissions, and building those into every MCP server is substantial work.
5. The Model Switcheroo
We were running Claude for our agent. Anthropic had a 4-hour outage in February. Our agent was dead for all 4 hours.
The fix is obvious in hindsight: have a fallback model. But switching models isn't just changing an API endpoint. Different models handle tool calling differently. Claude returns tool calls in a specific format. GPT-5 uses a different format. Open models vary wildly. Our system prompts had Claude-specific formatting instructions baked in.
We spent a week abstracting the model layer. Now we have a config that maps model-specific tool calling formats to our internal format, and the agent can switch from Claude to GPT-5 to MiniMax M2.5 without changing anything else. The failover is automatic: if the primary model returns a 5xx three times in a row, we switch to the backup for 30 minutes before trying the primary again.
The abstraction also let us run cheaper models for simple queries (summarise this thread, look up a ticket) and expensive models for complex ones (plan this sprint based on the last two weeks of conversations). Our costs dropped about 35%.
What I'd Do Differently
If I were starting over, three things.
Start with health checks, not features. Build the monitoring before you build the second MCP server. You'll catch problems in hours instead of days.
Define permissions at the operation level from day one. Tool-level permissions will bite you. It's tedious to set up but the alternative is an intern closing 47 tickets.
Abstract the model layer before you need to. You will need to switch models. Whether it's an outage, a pricing change, or a political situation (ask Anthropic about the Pentagon), your model will become unavailable at some point. Have a tested fallback.
Or just use SlackClaw and let someone else solve these problems. After two months of building and maintaining agent infrastructure, I understand why managed hosting exists. The credit-based pricing means you're paying for what you use, not maintaining what you built.
The agent infrastructure problem isn't building v1. It's keeping v1 running at 3am on a Saturday when the disk is full and the OAuth tokens have expired and nobody's awake to notice.
Helen Mireille is chief of staff at an early-stage tech startup. She writes about the gap between AI demos and AI in production.
Top comments (0)