Harish Kotra (he/him)

Posted on Jun 11

Building Self-Healing LLM Agents with Agno and Streamlit

#ai #programming #python #dailybuild2026

LLM agents are powerful, but they make mistakes — especially when calling tools with misformatted parameters. What if we could make them self-healing? In this post, I'll walk through building a production-grade Streamlit application with 14 real-world scenarios where agents detect, diagnose, and correct their own errors automatically.

The Problem

When an LLM calls a tool with invalid parameters — a float instead of an integer, "tomorrow" instead of a date, "California" instead of "CA" — most systems simply fail. The user sees an error and has to start over. This breaks the illusion of autonomy and frustrates users.

The Solution: `RetryAgentRun`

The Agno framework provides a built-in mechanism: RetryAgentRun. When a tool raises this exception with a descriptive error message, the agent:

Receives the error
Reads the correction hint
Re-thinks the parameters
Retries the tool call automatically

This loop costs at most one extra LLM round-trip per failure, which is far better than a human having to debug and re-prompt.

Architecture Overview

User Prompt → Agno Agent → Tool Call (may be wrong)
                           ↓
                    Tool validates parameters
                           ↓
              Invalid? → Raise RetryAgentRun → Agent corrects → Retry
              Valid?   → Return result        → Done

Key Implementation Lessons

1. Use `Any` Type Annotations

This was the biggest gotcha. Pydantic validates annotated parameters before the function body runs. If you write:

def process_payment(amount: Annotated[int, ...], currency: str):
    ...

And the LLM passes 100.5 (a float), Pydantic raises a ValidationError before your function ever executes. The RetryAgentRun never fires.

The fix: use Any for all parameters and validate inside the function body:

from typing import Any

def process_payment(amount: Any, currency: str):
    if not isinstance(amount, int):
        raise RetryAgentRun(
            f"amount must be an int, got {type(amount).__name__} '{amount}'. "
            f"Convert floats to integers (e.g. {int(amount)})."
        )
    ...

2. Force the Agent to Make Mistakes

Self-healing only works if there's something to heal. The agent preamble instructs the model to pass exact user-provided values on the first call:

CRITICAL: Call the tool with the EXACT parameter values the user provides — 
do NOT convert, transform, or reformat them yourself. The tool will validate 
and return a detailed error message if anything needs to be fixed.

Without this, powerful models like GPT-4 or Llama 3 will pre-correct values themselves, and you never see the healing cycle in action.

3. Timeout Safety

Not all models support function/tool calling. A 60-second thread-pool timeout prevents the app from hanging:

with concurrent.futures.ThreadPoolExecutor(max_workers=1) as executor:
    future = executor.submit(agent.run, prompt)
    result = future.result(timeout=60)

4. Parsing the Timeline

Agno's Message objects include a tool_call_error boolean. By scanning messages and extracting tool call arguments, error messages, and results, we build a structured timeline that visualises the entire self-healing cycle — tool call → retry error → corrected retry → success.

14 Real-World Scenarios

The application covers 6 categories with 14 concrete use cases:

Category	Use Cases	What Heals
💰 Financial	Payment processing, Credit card validation	Float→int, formatted card numbers
📅 Scheduling	Appointment booking, Timezone conversion	Relative dates, ambiguous timezone codes
🖥️ DevOps	Server provisioning, URL health check	Port types, missing URL schemes
📧 Communication	Email validation, Phone formatting	Invalid emails, unformatted phones
📊 Data	Database queries, Data import, Address validation	Wrong tables, missing fields, state codes
🌍 Others	Inventory, Unit conversion, Sanitisation, Geocoding	SKU lookups, incompatible units, XSS content

Results & Metrics

The app tracks run history with per-use-case metrics:

Tool calls: total invocations per run
Self-heals: how many RetryAgentRun events were triggered and resolved
Success rate: percentage of runs that completed successfully

Interactive Plotly charts show trends over time and breakdowns by scenario.

Running It Yourself

pip install streamlit pandas plotly agno openai httpx pydantic
streamlit run app.py

Configure your provider in the sidebar — Ollama, OpenAI, or any OpenAI-compatible endpoint. The Test Connection button verifies reachability before you start.

Self-healing agents transform fragile LLM tool calls into robust, autonomous workflows. The combination of Agno's RetryAgentRun, careful type annotation strategy, and thoughtful prompt engineering creates a system that can detect and fix its own mistakes — automatically, in real time.

The full source code is available with 14 ready-to-run scenarios. Try it with your favourite model and watch the agent heal itself.

Code & more: https://www.dailybuild.xyz/project/160-self-healing-agent

DEV Community

Building Self-Healing LLM Agents with Agno and Streamlit

The Problem

The Solution: `RetryAgentRun`

Architecture Overview

Key Implementation Lessons

1. Use `Any` Type Annotations

2. Force the Agent to Make Mistakes

3. Timeout Safety

4. Parsing the Timeline

14 Real-World Scenarios

Results & Metrics

Running It Yourself

Top comments (0)

The Problem

The Solution: RetryAgentRun

Architecture Overview

Key Implementation Lessons

1. Use Any Type Annotations

2. Force the Agent to Make Mistakes

3. Timeout Safety

4. Parsing the Timeline

14 Real-World Scenarios

Results & Metrics

Running It Yourself

The Solution: `RetryAgentRun`

1. Use `Any` Type Annotations