Last month I watched an agent rack up $40 in API calls.
The loop was supposed to run until the model finished. The stop condition was simple:
if response.stop_reason == "end_turn":
break
The model never returned "end_turn". It kept hitting "max_tokens" on every turn, generating partial output, then looping back and trying again. Nothing in the loop caught that. No iteration cap. No cost ceiling. No time limit. Just a condition that never fired.
Forty dollars and 200 iterations later, I manually killed the process.
This is not a rare story. It is the default story. Agent loops are full of conditions that seemed reasonable when written but fail silently in the wrong situation. The fix is not to write more conditions. The fix is to build the exit strategy first.
The Shape of the Fix
llm-stop-conditions is a small Python library for composing named stop conditions into a single evaluator. You define what should stop the loop, combine them, and get back a StopReason when one fires.
Install:
pip install llm-stop-conditions
The basics:
from llm_stop_conditions import MaxIters, MaxUsd, MaxSeconds, AnyOf, Evaluator
stopper = Evaluator(AnyOf(
MaxIters(20),
MaxUsd(2.0),
MaxSeconds(60),
))
for _ in range(1000):
response = call_llm(messages)
messages.append(response)
result = stopper.evaluate(
iteration=stopper.iteration,
cost_usd=get_cost(response),
elapsed_seconds=stopper.elapsed(),
response=response,
)
if result.should_stop:
print(f"Stopped: {result.reason}")
break
The evaluator tracks state internally. You pass in the current values on each turn. When any condition in AnyOf fires, should_stop is True and reason names the condition that triggered.
You can also use AllOf for cases where all conditions must be true before stopping. That is less common but useful for things like "stop only when both the model says it is done and fewer than 5 tokens were returned."
Individual conditions are usable on their own:
from llm_stop_conditions import NoProgress, MaxTokens, Custom
# stop after 50,000 tokens total
token_cap = MaxTokens(50_000)
# stop if the model seems to be spinning
no_progress = NoProgress(n_turns=3)
# custom: stop if the response contains a sentinel phrase
def done_phrase(ctx):
return "TASK_COMPLETE" in ctx.last_response_text
done = Custom(done_phrase)
Compose them however you need:
stopper = Evaluator(AnyOf(
MaxIters(20),
MaxUsd(2.0),
MaxSeconds(60),
no_progress,
done,
))
What It Does NOT Do
A few things this library deliberately leaves out:
- It does not make LLM calls. It evaluates what you pass in. You own the HTTP client.
- It does not patch or wrap your LLM client. There is no monkey-patching, no decorator magic.
- It does not parse API responses for you. You extract cost and token counts and pass them in as plain numbers.
- It does not retry, fallback, or recover. Stopping is its job. Recovery is for
llm-retryorllm-fallback-router.
This keeps the library at zero dependencies and keeps it testable in isolation.
Inside the Lib: The NoProgress Heuristic
The most interesting condition is NoProgress. The others are counters with thresholds. This one tries to detect something fuzzy.
The question it answers: is the model spinning in place, producing output that looks like progress but is not?
The definition used here is: the last N consecutive assistant messages contained no tool calls AND the response was shorter than 50 tokens.
Why these two signals together?
Tool calls are work. If the model is doing something useful, it is usually invoking tools. A turn with no tool calls and very short text is either genuine completion or a model that has lost the thread and is producing filler. After N consecutive turns like that, it is almost certainly the latter.
The 50-token threshold is configurable:
NoProgress(n_turns=3, min_tokens=100)
Setting min_tokens=0 makes it trigger only on the absence of tool calls, ignoring response length. Setting n_turns=1 makes it aggressive. The defaults are conservative on purpose.
One thing this does not detect: a model that keeps calling the same tool over and over. That is a loop problem, not a progress problem. tool-loop-guard handles that case.
When This Is Useful
Agent loops that run for an unknown number of turns. Any time the exit condition is "when the model says it is done" you should also have a fallback that fires when the model does not say it is done.
Production systems where runaway loops have a real cost. The MaxUsd condition is not about testing. It is about capping spend when something goes wrong at 3am.
Debugging. When your loop exits unexpectedly, result.reason tells you exactly which condition fired and what the value was at that point. That is easier to work with than a blank screen after a process dies.
Multi-step pipelines. If one stage of your pipeline spins, you want the stage to stop cleanly and report why, not consume the rest of your rate limit.
When NOT to Use This
For simple scripts that run a fixed number of turns, this is overkill. A for i in range(10) loop is fine.
If your loop is a single call with no iteration, there is nothing to compose. This is for loops.
If you need the stop condition to also trigger retries or switching providers, combine this with llm-retry or llm-fallback-chain. Stopping and recovering are separate concerns.
Install
pip install llm-stop-conditions
Zero dependencies. Python 3.9+.
Source: MukundaKatta/llm-stop-conditions
29 tests covering each condition, AnyOf, AllOf, edge cases for NoProgress, and evaluation after reset.
Siblings
These libraries address adjacent boundaries in the same agent loop:
| Lib | Boundary | Repo |
|---|---|---|
| tool-call-budgets | Per-tool call-count cap, stops runaway tool use | MukundaKatta/tool-call-budgets |
| token-budget-py | Shared token/USD pool across concurrent agents | MukundaKatta/token-budget-py |
| agent-deadline | Cooperative time deadline with check_or_raise
|
MukundaKatta/agent-deadline |
| llm-circuit-breaker-py | Error-rate breaker that opens after N failures | MukundaKatta/llm-circuit-breaker-py |
The difference between agent-deadline and MaxSeconds is cooperative vs. evaluative. agent-deadline raises an exception from anywhere in your code when the deadline passes. MaxSeconds in this library is checked at the top of your loop and only stops the next iteration. Both are useful. They solve slightly different shapes of the same problem.
What Is Next
A few things that would improve this library:
Async evaluator support. The current evaluator is synchronous. An async variant would let you check conditions concurrently with in-flight requests in some architectures.
Serializable state. If you want to resume a long-running agent from a checkpoint, the evaluator state needs to serialize. That pairs naturally with agent-resume.
A FirstOf alias for AnyOf with clearer intent when only one condition is expected. Minor, but readability matters in code you will read at 2am when something is wrong.
Persistent stop logs. A StopLog that appends every StopReason to a JSONL file would make post-run debugging much easier on long jobs.
The $40 bill was an education. This library is the result. Write the exit strategy before you write the loop.
Part of the @mukundakatta agent tooling stack, built for the Hermes Agent Challenge.
Top comments (0)