I Stress-Tested My OpenClaw Config for 48 Hours. Here's What Actually Broke

#openclaw #ai #agents #productivity

Most OpenClaw configs get tested like this: start it up, send a few messages, call it good. If it responds in the first five minutes, it's production-ready.

I did that. Then I walked away for a weekend.

When I came back, I had three silent failures, one token bill that made me wince, and a healthy respect for what breaks when you're not watching.

Here's what I learned running my OpenClaw agent continuously for 48 hours — and the three things that failed in ways the documentation doesn't warn you about.

The Setup

I spun up a fresh OpenClaw instance with a cron job that ran every 15 minutes, a model chain (MiniMax M3 → OpenRouter fallback), and the browser tool enabled. I also wired in a simple health-check script that logged every run to a local file.

# Health check — run this as a cron every 15 min
curl -s http://localhost:18789/status | \
  jq -r '.sessionActive or .gatewayOk' > /tmp/oc-health.log
echo "$(date): $(cat /tmp/oc-health.log)" >> /tmp/oc-health-timeline.log

Nothing exotic. This is a pretty standard OpenClaw setup for someone running it 24/7.

Failure #1: The Context Window Was a Slow Leak

I expected the model to eventually run out of context. What I didn't expect was how slowly it would happen — and how silently.

After about 18 hours, I started seeing responses that felt... thin. Shorter. Less detailed. The model wasn't erroring. It wasn't refusing. It was just quietly adapting to less context space.

I checked the logs. The cron was running fine. The model was responding. But because I hadn't set an explicit session-purge policy, the session history was accumulating. The model was adapting its responses based on remaining context — not failing, just becoming economical.

The fix:

{
  "session": {
    "maxHistoryMessages": 50,
    "maxAgeHours": 12,
    "resetOnAge": true
  }
}

Setting maxHistoryMessages to 50 and maxAgeHours to 12 means the session resets itself before it becomes a problem. The model still has enough context to be useful, but it won't slowly strangle itself over a long weekend.

Failure #2: The Cron Didn't Know the Agent Was Busy

My 15-minute cron was firing even when the previous agent run was still active. I had it set up as a standard cron that ran a research task — gather data, summarize, save.

But if the previous run took more than 15 minutes (it occasionally hit external APIs that were slow), the next run would start before the last one finished. OpenClaw handled this gracefully — it queued the job — but the queue grew faster than I expected.

After 24 hours, I had 6 queued jobs and a 40-minute backlog.

The fix wasn't obvious from the docs. I had to add a mutex guard:

# Wrapper script — prevents concurrent cron runs
LOCKFILE="/tmp/openclaw-cron.lock"

if [ -f "$LOCKFILE" ]; then
  AGE=$(($(date +%s) - $(stat -c %Y "$LOCKFILE" 2>/dev/null || echo 0)))
  [ "$AGE" -lt 900 ] && echo "Previous run still active, skipping" && exit 0
fi

touch "$LOCKFILE"
openclaw run --task "$(cat /tmp/oc-cron-task.md)"
rm -f "$LOCKFILE"

Fifteen minutes is the age threshold. If the lock file is younger than 900 seconds (15 minutes), skip this run. If it's older, assume the previous run crashed and proceed anyway.

This is the kind of thing that looks fine in testing when everything responds in 3 seconds. In production, with real APIs and real latency, it matters.

Failure #3: The Model Fallback Looked Like Success

When MiniMax M3 hit a rate limit at hour 31, OpenClaw fell back to OpenRouter. The fallback worked. The task completed.

Except it cost 4x as much per token and took 3x as long.

The logs said "OK." The cron said "OK." The task got done. But I was burning budget on expensive tokens because the fallback was invisible in the default logs.

I had to add explicit cost tracking:

# After each run, log token usage
curl -s http://localhost:18789/status | \
  jq '.model, .lastRunTokens, .lastRunCost' >> /tmp/oc-cost.log

Then a quick weekly review:

# Summarize costs per model
cat /tmp/oc-cost.log | \
  awk -F'"' '{print $2}' | \
  paste - - - | \
  awk '{model[$1]++; cost[$1]+=$3} END {for (m in model) print m, cost[m], "calls:" model[m]}'

The first week I ran this, OpenRouter cost me $14.23 in token spend. I hadn't planned for that.

What I Learned

After 48 hours, the failures weren't dramatic. The agent didn't go rogue. It didn't corrupt data. It just... quietly did the wrong thing in ways that cost me money and time.

The real lesson: OpenClaw is reliable until it's not, and the "not" usually happens outside business hours.

The three fixes I made — session reset policy, cron mutex guard, cost tracking — took about 2 hours to implement. The 48 hours of testing cost me about $20 in API tokens and one mildly stressful Sunday morning.

That's a good trade. I now have a config that runs for days without my attention. The session stays fresh. The cron doesn't pile up. And I know what I'm actually paying for.

If you're running OpenClaw in production and you're not stress-testing it for at least a full day, you're shipping config that's never met a weekend.