While building AI agents, I kept running into the same uncomfortable question:
How do I guarantee an agent execution will stop?
Not “usually stop.”
Not “log when it goes wrong.”
But actually guarantee it won’t run forever, retry endlessly or burn money in a loop.
Most agent frameworks focus on reasoning quality.
I was more worried about runaway execution.
That’s what led me to build AgenWatch.
What the problem actually is
The real problem with AI agents
If you’ve worked with agents, you’ve probably seen this:
- Infinite reasoning loops
- Silent retries
- Budget overruns discovered after the damage
- Tools being called repeatedly because the model “tries again”
Observability helps explain what happened.
It does nothing to stop it.
I didn’t want better logs.
I wanted runtime enforcement.
The idea: Treat agent execution like an operating system problem
In operating systems, we don’t trust processes to behave correctly.
We enforce limits:
- CPU time
- Memory
- Permissions
I applied the same idea to AI agents.
Instead of trusting the LLM to stop, I built a runtime execution kernel that decides:
- whether a step is allowed
- whether a tool can be called
- whether execution must halt
That kernel became AgenWatch.
What AgenWatch is (and is not)
AgenWatch is:
- A runtime execution kernel for AI agents
- A bounded execution controller
- A governance layer that enforces limits before execution
AgenWatch is not:
- An agent framework
- A prompt engineering tool
- An observability dashboard
- A replacement for LangChain or CrewAI
A minimal AgenWatch example
This is a basic example showing runtime budget enforcement.
import os
from agenwatch import Agent, tool
from agenwatch.providers import OpenAIProvider
@tool("Echo input text")
def echo(**kwargs) -> dict:
text = kwargs.get("text", "")
return {"echo": text}
agent = Agent(
tools=[echo],
llm=OpenAIProvider(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-mini"
),
budget=1.0,
max_iterations=5
)
result = agent.run("Echo hello")
print(f"Success: {result.success}")
print(f"Cost: {result.cost}")
print(f"Output: {result.output}")
If the budget or iteration limit is exceeded, the kernel blocks the next call before it executes.
Using LangChain with AgenWatch
LangChain can generate tasks and prompts.
AgenWatch governs execution.
import os
from langchain_core.prompts import ChatPromptTemplate
from agenwatch import Agent, tool
from agenwatch.providers import OpenAIProvider
@tool("Echo text safely")
def echo(**kwargs) -> dict:
return {"echo": kwargs.get("text", "")}
agent = Agent(
tools=[echo],
llm=OpenAIProvider(
api_key=os.getenv("OPENAI_API_KEY"),
model="gpt-4o-mini"
),
budget=1.0,
max_iterations=3
)
prompt = ChatPromptTemplate.from_messages([
("human", "Say hello using the echo tool")
])
task = prompt.format_messages()[0].content
result = agent.run(task)
print(result.success, result.cost, result.output)
LangChain handles what to do.
AgenWatch enforces whether it’s allowed to continue.
What AgenWatch does NOT do (by design)
In v0.1.x, AgenWatch:
- Does not persist execution state to disk
- Does not resume after process crashes
- Does not rollback external side effects
- Does not sandbox the OS or subprocesses
If a hard limit is hit mid-execution, AgenWatch freezes and reports.
Rollback is an orchestration concern, not a kernel concern.
Why I’m sharing this
I built AgenWatch because I needed hard execution guarantees, not better explanations after failure.
It’s early.
It’s intentionally narrow.
But it already solved a real production problem for me.
If you’re building agents and care about:
- cost control
- safety
- deterministic stopping
you might find it useful.
GitHub: https://github.com/agenwatch/agenwatch
PyPI: https://pypi.org/project/agenwatch/
Top comments (0)