Cooperative Agent Deadlines: Give Every Task a Wall-Clock Time Budget

#hermeschallenge #ai #python #agents

The SLA is 30 seconds. The agent started 28 seconds ago. It is in the middle of its third tool call. There is no mechanism to stop it.

The agent finishes at 47 seconds. The user sees a timeout error from your load balancer. The agent's work is wasted. Worse, the agent might still be running when the next request for the same user arrives.

agent-deadline is a cooperative wall-clock deadline: the agent checks the deadline before each step and stops before time runs out.

The Shape of the Fix

from agent_deadline import Deadline, DeadlineExceeded

deadline = Deadline(seconds=25)  # 25-second budget, 5s before the SLA

while True:
    if deadline.exceeded():
        raise DeadlineExceeded(elapsed=deadline.elapsed(), budget=25)

    response = call_llm(messages)
    messages.append({"role": "assistant", "content": response.content})

    if response.stop_reason == "end_turn":
        return extract_text(response)

    for tool_call in response.content:
        if tool_call.type == "tool_use":
            deadline.check()  # Raises DeadlineExceeded if time is up
            result = execute_tool(tool_call.name, tool_call.input)
            messages.append({"role": "user", "content": [...]})

The agent checks the deadline at natural pause points: before each LLM call and before each tool execution. When time is up, DeadlineExceeded propagates up to the caller cleanly.

What It Does NOT Do

agent-deadline does not forcibly terminate execution. It is cooperative: the agent checks the deadline voluntarily. If the agent does not call check() or exceeded(), the deadline has no effect. A tool that runs for 30 seconds without returning cannot be interrupted by this library.

For forcible timeout of synchronous tool calls, use tool-timeout-wrap, which runs the tool in a thread with a concurrent.futures.ThreadPoolExecutor and cancels it via the future's timeout.

It does not automatically propagate deadlines to nested calls. If your tool itself calls another agent or makes a network request, the deadline does not automatically apply to those nested calls. You need to pass the deadline or compute a sub-budget explicitly.

Inside the Library

The deadline is a monotonic start time plus a budget:

import time

class Deadline:
    def __init__(self, seconds: float):
        self._start = time.monotonic()
        self._budget = seconds

    @property
    def elapsed(self) -> float:
        return time.monotonic() - self._start

    @property
    def remaining(self) -> float:
        return max(0.0, self._budget - self.elapsed)

    def exceeded(self) -> bool:
        return self.elapsed >= self._budget

    def check(self) -> None:
        if self.exceeded():
            raise DeadlineExceeded(
                elapsed=self.elapsed,
                budget=self._budget,
                remaining=0.0,
            )

    def budget_for_step(self, fraction: float) -> float:
        """Return a sub-budget for a nested call (fraction of remaining time)."""
        return self.remaining * fraction

    def as_context(self) -> str:
        """Return a human-readable deadline hint for injection into prompts."""
        r = self.remaining
        if r <= 5:
            return f"URGENT: Only {r:.0f} seconds remaining. Provide a concise answer now."
        elif r <= 15:
            return f"Time constraint: {r:.0f} seconds remaining. Be concise."
        return ""

The as_context() method is a useful addition: when you are 80% through the deadline, inject a prompt reminder that time is short. Many models will respond with shorter answers when told they need to be concise.

The budget_for_step() helper computes a sub-budget for nested calls:

# If 10 seconds remain and you want to give a nested call 50% of that:
nested_deadline = Deadline(seconds=deadline.budget_for_step(0.5))

This prevents a single slow nested call from consuming the entire remaining budget.

When to Use It

Use it whenever your agent must meet an SLA. Any user-facing agent that is behind a load balancer or reverse proxy with a timeout should have a deadline that fires before the infrastructure timeout. A clean DeadlineExceeded error is better than a 502 Gateway Timeout.

Use it in event-driven systems with processing time budgets. If your agent consumes from a queue and must process each message within N seconds, set a deadline per message. Messages that exceed the deadline are NACKed for retry rather than silently timing out.

Use it with adaptive prompting. When deadline.remaining < 10, inject deadline.as_context() into the next LLM call. The model may provide a shorter, less thorough answer that still satisfies the user better than a timeout.

Skip it for batch processing where wall-clock time is not a constraint. Background batch jobs have no SLA and do not need deadlines.

Install

pip install git+https://github.com/MukundaKatta/agent-deadline

# Or from PyPI
pip install agent-deadline

from agent_deadline import Deadline, DeadlineExceeded
from fastapi import FastAPI, HTTPException

app = FastAPI()

@app.post("/agent/run")
async def run_agent(request: AgentRequest) -> AgentResponse:
    # 25s budget, 5s before the 30s load balancer timeout
    deadline = Deadline(seconds=25)

    try:
        result = await run_agent_loop(
            task=request.task,
            deadline=deadline,
        )
        return AgentResponse(result=result, elapsed=deadline.elapsed)

    except DeadlineExceeded as e:
        # Return partial result if available, else error
        raise HTTPException(
            status_code=503,
            detail=f"Agent did not complete in time ({e.elapsed:.1f}s elapsed)",
        )

async def run_agent_loop(task: str, deadline: Deadline) -> str:
    messages = [{"role": "user", "content": task}]

    while True:
        deadline.check()

        # Inject time hint when running low
        time_hint = deadline.as_context()
        if time_hint:
            messages[-1]["content"] += f"\n\n{time_hint}"

        response = await call_llm(messages)
        messages.append({"role": "assistant", "content": response.content})

        if response.stop_reason == "end_turn":
            return extract_text(response)

        for tool_call in get_tool_calls(response):
            deadline.check()
            result = await execute_tool(tool_call.name, tool_call.input)
            messages.append(build_tool_result(tool_call.id, result))

Sibling Libraries

Library	What it solves
`tool-timeout-wrap`	Forcible timeout for individual synchronous tool calls
`llm-stop-conditions`	Composable stop conditions including wall-clock based
`agent-loop-bound`	Hard iteration cap (complements deadline)
`agent-step-log`	Log elapsed time per step for deadline analysis
`llm-retry`	Retry with backoff when deadline-safe calls fail transiently

The time management stack: agent-deadline for cooperative wall-clock budgets, tool-timeout-wrap for forcible tool timeouts, agent-loop-bound for iteration caps, llm-stop-conditions for composite stop logic.

What's Next

Deadline propagation context var: a ContextVar[Deadline] that makes the active deadline accessible to any code in the call chain without explicit passing. Nested tool calls can read the remaining budget from context without the caller explicitly passing the deadline object.

Partial result protocol: DeadlineExceeded carries a partial_result field. When the agent has made progress before hitting the deadline, it can attach that progress to the exception. Callers can then return a "partial answer" response instead of an error.

Metrics: deadline.stats() that returns elapsed, remaining, fraction_used, and whether any check was close (within 2 seconds of exceeding). The "close calls" metric tells you whether your deadline budget is sized correctly for actual usage.

Built as part of the agent-stack family: composable Python primitives for production LLM agents.