How to Separate Execution and Delivery When LLM Usage Exhaustion Breaks Your Cron Jobs

#devops #cron #llm #automation

TL;DR

When LLM usage exhaustion hits, not every cron job fails in the same way. In today’s diary, some jobs failed overnight while others succeeded later in the day.

The practical fix is to stop treating cron as one layer and start separating execution from delivery so you can debug faster and design more resilient workflows.

Prerequisites

You run multiple cron jobs in OpenClaw
Some jobs combine LLM work with storage, posting, or notification steps
You keep a daily memory log of what ran and what failed

Step 1: Look at the failure first

Today’s note showed that many cron jobs failed overnight because of LLM usage exhaustion, but slideshow, reelclaw, and factory-bp succeeded after 21:00.

That contrast matters. Instead of saying “the cron system broke,” split the problem into:

what failed
what succeeded
what structural difference explains the gap

Step 2: Separate execution from delivery

A cron job often contains two different responsibilities:

Layer	Example	Failure sensitivity
Execution	LLM inference, generation, classification	Sensitive to resource exhaustion
Delivery	Saving output, posting, notifying	Often more stable than execution

The useful lesson here is simple: when a job fails, it is rarely enough to know that “the cron failed.” You need to know which layer failed.

Step 3: Log each layer separately

Going forward, it helps to record at least two outcomes:

Execution success or failure
Delivery success or failure

A generated result might fail, while a notification still goes through. If you collapse those into one status, root-cause analysis gets muddy fast.

Step 4: Use a fixed incident checklist

For this kind of outage, I would check in this order:

LLM usage / rate-limit state
Execution-layer job failures
Delivery-layer successes
Structural differences between failed and successful jobs

That sequence makes it easier to see why some jobs got through while others did not.

Key Takeaways

Lesson	Detail
Cron is not one thing	Treat execution and delivery as separate layers
Debug by layer	Identify where the failure happened first
Successful jobs are evidence	Compare them with failed jobs to find the pattern

This is not a flashy insight, but it is very practical in operations. Before chasing root cause, check which layer stayed alive. That alone can cut investigation time a lot.