Lars Winstand

Posted on Jun 9 • Originally published at standardcompute.com

My Telegram bot stopped replying after OpenClaw 2026.6.1 — it was a full disk, not GPT-5

#devops #ai #openclaw #debugging

I love how quickly we all blame the interesting part of the stack.

Telegram bot goes silent? Must be GPT-5. Or Claude Opus 4.6. Or provider routing. Or some weird prompt regression. Maybe OpenClaw changed how sessions work. Maybe the model had a bad day.

And then the logs say:

ENOSPC: no space left on device, write

That was the real cause in an OpenClaw 2026.6.1 failure I was looking into this week.

The visible symptom was classic agent weirdness:

Telegram bot not replying
TUI not producing output
repeated assistant turn failed before producing content
model shown as openai/gpt-5.5
local runtime at ws://127.0.0.1:18789

If you only looked at the surface, you’d absolutely start by swapping models or debugging prompts.

Wrong first move.

The failure looked like a model problem

The runtime command looked normal:

openclaw tui - ws://127.0.0.1:18789 - agent main - session main

The visible error in the UI was vague:

[assistant turn failed before producing content]

But the actual failure was much simpler:

run error: ENOSPC: no space left on device, write

That’s not GPT-5 failing.
That’s your local runtime hitting the storage layer before the model can return anything.

Why this wastes so much debugging time

Agent failures often present at the top of the stack and originate at the bottom.

When a Telegram bot stops replying, you usually don’t get a nice banner saying:

Disk usage: 100%
SQLite writes failing
Session store corrupted

You get silence.

So people do the reasonable thing:

retry with GPT-5
retry with Claude Opus 4.6
switch providers
lower temperature
trim prompts
blame context windows

All valid tests.
Still the wrong first tests if the machine itself is unhealthy.

Long-running agents are great at slowly creating operational problems:

session history grows
logs grow
SQLite files grow
plugin state grows
Telegram history grows
local caches grow

If you’re running OpenClaw on a VPS, a tiny cloud box, a home server, or a machine you haven’t checked in months, disk is a very normal way to fail.

OpenClaw 2026.6.1 seems to expose two classes of problems

The disk issue was the obvious one.

But it wasn’t the only clue.

There were also upgrade and state warnings around plugin metadata and SQLite state, including messages like:

Left plugin install index in place because shared SQLite state has conflicting plugin install metadata for: codex

That’s the kind of warning that tells you freeing disk might not be enough.

You may also be dealing with:

partially migrated local state
plugin install metadata conflicts
provider/plugin changes after upgrade
stale SQLite state

So the sequence becomes:

bot stops replying
you free space
restart OpenClaw
it still behaves strangely
now you blame the model

Still maybe not the model.

The update may not have broken your agent — it may have exposed old mess

One useful detail from OpenClaw 2026.6.1 discussions: provider handling changed from bundled providers to plugins.

That matters a lot.

If your config expected one layout and the new version expects plugin installs plus updated config, the symptoms can look like model failure even when the real issue is local runtime setup.

A practical fix people mentioned was:

openclaw doctor

If you upgraded and didn’t run doctor, do that before you touch prompts.

Three boring failures that all look dramatic

Failure source	What it looks like
Storage exhaustion (`ENOSPC`)	Assistant fails before producing content; Telegram goes silent; writes fail in local runtime
Plugin/provider migration issues	Breakage right after upgrade; doctor warnings; missing plugins; provider config stops matching reality
Model/context config mismatch	Errors like `context too large`; execution failures caused by bad config rather than model quality

This is the pattern I think more agent teams need to internalize:

Check the machine first.
Check local state second.
Check migrations third.
Then start blaming models.

What I’d check first when a Telegram bot goes silent

Here’s the order I’d use.

1) Check disk space immediately

df -h

If you want to find the obvious offenders:

du -sh ./* 2>/dev/null | sort -h

Or for system-wide pain points:

sudo du -xh / | sort -h | tail -50

Things worth checking:

OpenClaw session storage
SQLite database files
logs
cache directories
Telegram-related state
temp files

If you see ENOSPC, stop debugging prompts. Fix storage first.

2) Run OpenClaw doctor

openclaw doctor

Especially after upgrading to 2026.6.1 or later.

If OpenClaw moved providers to plugins and your old config still assumes bundled providers, doctor is likely to tell you faster than trial-and-error will.

3) Look for migration and plugin warnings

Search logs for anything involving:

SQLite
migration
plugin
metadata
codex
provider

Examples of the kind of thing that matters:

conflicting plugin install metadata
legacy migration behavior
missing provider plugin

If those show up after an upgrade, don’t assume the state store is trustworthy.

4) Verify provider and model config

Make sure the provider plugins you actually installed match what your config references.

Also verify context settings.

If OpenClaw thinks a model supports one context size and the provider setup says otherwise, you can get failures that look like model instability but are really config mismatch.

5) Only now test prompts and model selection

Once the machine is healthy and the local state is sane, then it makes sense to compare:

GPT-5
Claude Opus 4.6
Grok 4.20
Qwen variants
Llama variants

This is also where having an OpenAI-compatible endpoint helps. If your app can switch providers without rewriting your integration, isolating model-vs-runtime issues gets much easier.

That’s one reason I like the drop-in API approach Standard Compute takes: you can keep your existing OpenAI SDK or HTTP client, swap the backend, and test whether the problem is model routing or your local runtime without rebuilding the app. More importantly, if you’re running agents 24/7, flat-rate compute means you can do that testing without watching token spend every minute.

Sometimes it really is the model config

To be fair, not every issue here is disk or migration state.

There were also reports around context too large after updating.

That’s real.

But even then, I’d still classify it as a configuration problem before I’d call it a model problem.

There’s a big difference between:

“Claude got worse”
“GPT-5 is flaky”

and:

“my runtime registered the wrong context size”
“my provider plugin setup no longer matches config”

One is model blame.
The other is operations.

Most of the time, operations wins.

Minimal debugging checklist

If I were writing the incident note, it would be this:

If Telegram bot stops replying after an OpenClaw update:

1. Check disk space
2. Search logs for ENOSPC
3. Run `openclaw doctor`
4. Inspect migration/plugin warnings
5. Verify provider plugin installation
6. Verify model/context config
7. Only then compare models or prompts

That order saves hours.

The unsexy lesson

If your agent dies right after an upgrade, assume boring infrastructure first.

Not because models never fail.
Because local failures are much more common than people want to admit.

The smarter the stack gets, the more embarrassing the outages become.

A Telegram bot running through OpenClaw, talking to openai/gpt-5.5, connected over ws://127.0.0.1:18789, can still be taken down by the least glamorous error in computing:

no space left on device

That’s good news, honestly.

Boring problems are fixable.

And if you’re building long-running agents in OpenClaw, n8n, Make, Zapier, or custom loops, this is the operational habit worth keeping:

Models second. Machine first.

If the runtime can’t write to disk, GPT-5 never even gets a chance to be wrong.

DEV Community