Oliver S

Posted on Aug 7

How to build a self-improving agent that updates your UI in real time

#ai #observability #evaluation #autonomousaifixes

What my AI agent actually does (and why it's pretty cool)

Invoice Copilot: Talk to your invoices. This self-improving AI agent answers questions and creates visual reports in real-time

So I built this thing called Invoice Copilot, and honestly, it's doing some pretty wild stuff. Let me break down what's actually happening under the hood.

The problem I was trying to solve

You know how most AI agents are pretty static? They follow some instructions, give you an output, and if an error occurs nothing happens until the engineers modify something manually. My agent has super powers - it actually improves itself! Literally, it has autonomous fixes.

Like, imagine you have a bunch of invoices, you upload them and you ask the AI: "Hey, show me a bar chart of my monthly expenses." Most AI tools would just give you the chart and if there is an error about the sum for example, nothing happens. Mine actually creates the React component with the chart, updates your UI in real-time, and if the LLM drifts, for example the sum was wrong 😱, it actually detects the issue and fixes itself so it never happens again.

Full code open source at: https://github.com/Handit-AI/invoice-copilot

Let's dive in!

What my AI agent actually does (and why it's pretty cool)
The problem I was trying to solve
1. Architecture Overview
2. Setting Up Your Environment
- Backend
- Frontend
3. The Core: MainDecisionAgent 🧠 (the boss)
4. Action Classes: Specialized Tools for Every Task 🎯
5. The self-improvement (Best Part)
6. Results
7. Conclusions

1. Architecture Overview

Let's understand the architecture of our Invoice Copilot:

This architecture separates concerns into distinct nodes:

MainDecisionAgent: This node is responsible for analyzing user requests and decide which actions to take based on the request type, current context, and execution history.
Update File Agent: This is the most sophisticated action class, responsible for generating complete React components with professional charts and visualizations using the Recharts library.
Load Invoice Data: This function scans the processed/ directory for JSON files containing invoice data that has been processed by Chunkr AI.
Simple Report Action: This action is used when users ask for specific information about the data that doesn't require charts or complex visualizations. It provides direct answers to questions.
Other Request Action: This action is used when users make requests that are outside the scope of business reporting or data visualization.
Finish Action - Exit Loop: If the MainDecisionAgent believes no more actions are needed, then proceeds to return a finish
Format Response: This method takes the complete execution history and generates a professional response that explains what was accomplished, addresses the user's original request, and provides helpful next steps.

2. Setting Up Your Environment

Backend

1. Clone the Repository

git clone https://github.com/Handit-AI/invoice-copilot.git
cd invoice-copilot/backend

2. Create Virtual Environment

# Create virtual environment
python -m venv .venv

# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate

# On Windows:
.venv\Scripts\activate

3. Install Dependencies

# Install dependencies
pip install -r requirements.txt

4. Environment Configuration

# Copy environment example
cp env.example .env

5. Configure API Keys

Edit the .env file and add your API keys:

# Required API Keys
# Get your API key from: https://platform.openai.com/api-keys
OPENAI_API_KEY=your_openai_api_key_here
# Get your API key from: https://www.handit.ai/
HANDIT_API_KEY=your_handit_api_key_here
# Get your API key from: https://chunkr.ai/
CHUNKR_API_KEY=your_chunkr_api_key_here

# Optional Configuration
OPENAI_MODEL=gpt-4o-mini-2024-07-18

6. Run the Application 🚀

Development Mode

# Make sure virtual environment is activated
source .venv/bin/activate  # macOS/Linux
# or
.venv\Scripts\activate     # Windows

# Start the FastAPI server
python main.py

The server will start on http://localhost:8000

Frontend

1. Clone the Repository

#If you have cloned before, then skip this
git clone https://github.com/Handit-AI/invoice-copilot.git
#Go to the frontend folder
cd invoice-copilot/frontend

2. Install Dependencies

# Using npm
npm install

# Using yarn
yarn install

3. Environment Configuration

# Copy environment example
cp .env.example .env

4. Run the Application 🚀

Development Mode

# Start development server
npm run dev

# Or with yarn
yarn dev

The application will start on http://localhost:3000

3. The Core: MainDecisionAgent 🧠 (the boss)

Think of it as the brain that decides what to do next. Here's what happens:

You ask for something - like "create a pie chart of expenses by category"
The agent thinks about it - it looks at your request and decides what tool to use
It picks the right action - maybe it needs to edit a file, or maybe just give you a simple answer
Then assigns the task to the correct node - For example, if it is something that requires coding a graph then, it actually creates the React component of the graph you asked for
It tells you what it did - gives you a nice summary

Here is a summary of the prompt and the main function:

def analyze_and_decide(self, user_query: str, execution_id: str, 
                      history: List[Dict[str, Any]], working_dir: str = "") -> Dict[str, Any]:
    # It looks at what you asked for
    # It checks what it's done before (so it doesn't repeat itself)
    # It decides what tool to use next

    system_prompt = f"""You are a professional report and data visualization specialist. 
    Given the following request, decide which tool to use from the available options.

    Available tools/actions:

     1. edit_file: Create or edit professional reports with data                          visualizations, if graphs are needed to complete the user request
        - Parameters: target_file, instructions, chart_description
        - Example:
          tool: edit_file
          reason: I need to create a professional report...
     [...more info...]

    2. simple_report: ONLY If the user wants to know especific information about the data, the user resquest is simple and graphs are not needed to complete the user request
     [...more info...]

    3. other_request: If the user request is not related to the report, graphs, statistics data, you can use this tool to do other requests.
     [...more info...]

    4. finish: Complete the task and provide final response
        - No parameters required
        - Example:
          tool: finish
          reason: I have successfully completed all the user request.
        params: {{}}

    Respond with a YAML object

    If you believe no more actions are needed, use "finish" as the tool and explain why in the reason.
    """

    # It asks the LLM what to do
    response = call_llm(system_prompt, user_query)

    # It gets back something like:
    # tool: edit_file
    # reason: User wants a visualization, so I need to create a chart
    # params: target_file: DynamicWorkspace.tsx, instructions: "create pie chart"

4. Action Classes: Specialized Tools for Every Task 🎯

EditFileAction - the UI builder

This is where the magic happens. When you ask for a chart, this thing:

Loads your real invoice data - it reads the JSON files you've uploaded
Figures out what charts to make - it analyzes your data and decides on the best visualizations
Generates React code - it actually writes the React component with Recharts
Updates your UI - it replaces the file and your dashboard updates instantly

def execute(self, params: Dict[str, Any], working_dir: str = "", 
            execution_id: str = None) -> Dict[str, Any]:
    # Load real invoice data
    invoice_data = load_invoice_data()

    # Tell the LLM to create a React component
    system_prompt = f"""
    Create a COMPLETE professional business report React component.

    MANDATORY REQUIREMENTS:
    - Use ONLY Recharts library for ALL charts
    - Extract REAL values from the provided invoice JSON data
    - Calculate actual metrics from the invoice data
    - NO sample/fake data whatsoever
    """

    # The LLM generates the React code
    response = call_llm(system_prompt, user_prompt)

    # We replace the entire file with the new component
    success, message = overwrite_entire_file(full_path, new_react_code)

SimpleReportAction - the data analyst

For simple questions like "What's my total revenue?", it just analyzes the data and gives you a straight answer:

def execute(self, params: Dict[str, Any], working_dir: str = "", 
            execution_id: str = None) -> Dict[str, Any]:
    # Load the invoice data
    invoice_data = load_invoice_data()

    # Ask the LLM to analyze it
    system_prompt = f"""
    Answer the user's request based on the provided data.

    GUIDELINES:
    - For math operations, be super accurate and precise
    - Focus on providing specific information requested
    - Use only the real data processed
    """

    response = call_llm(system_prompt, user_prompt)

    return {
        "success": True,
        "response": response,
        "request_type": "simple_report"
    }

OtherRequestAction - the polite redirector

If you ask it something completely unrelated like "Can you write me a poem?", it politely redirects you back to what it's good at:

def execute(self, params: Dict[str, Any], working_dir: str = "", 
            execution_id: str = None) -> Dict[str, Any]:
    system_prompt = f"""
    The user has made a request that's not related to reports or data.

    Please respond politely and explain that you specialize in:
    - Creating reports and data visualizations
    - Analyzing invoice and financial data
    - Generating charts and graphs with business insights
    """

    response = call_llm(system_prompt, user_request)

    return {
        "success": True,
        "response": response,
        "request_type": "other_request"
    }

[...Other Tools...]

Want to dive deep into the tools/actions and prompts? Check out the full open-source code at: https://github.com/Handit-AI/invoice-copilot

5. The self-improvement (Best Part)

Here's the really cool thing - this AI agent actually gets better over time. Here is the secret weapon Handit.ai

Every action, every response is fully observed and analyzed. The system can see:

Which decisions worked well
Which ones failed
How long things take
What users actually want
When the LLM makes mistakes
And more...

And yes sir! When this powerful tool detects any mistakes it fixes automatically.

This means the AI agent can actually improve itself. If the LLM calculates the wrong sum or generates incorrect answers, Handit.ai tracks that failure and automatically adjusts the AI agent to prevent the same mistake from happening again. It's like having an AI engineer who is constantly monitoring, evaluating and improving your AI agent.

To get self-improvement we need to accomplish these steps:

1. Let's setup Handit.ai observability

This will give us full tracing to see inside our llm's and tools to understand what they’re doing.

Notice this project comes configured with Handit.ai Observability, you'll only need to get your own API token. Follow these steps:

1. Create an account here: Handit.ai

2. After creating your account, get your token here: Handit.ai token

3. Copy your token and add it to your .env file:

HANDIT_API_KEY=your_handit_token_here

Once you have accomplish this step, every time you interact with the conversational agent, you will get full observability in the Handit.ai Tracing Dashboard

Like this:

2. Set up evaluations

1. Add your AI provider (OpenAI, GoogleAI, etc.) token here: Token for evaluation

2. Assign evaluators to llm's nodes here: (Handit.ai Evaluation) [https://dashboard.handit.ai/evaluation-hub]

For this project specially assign Correctness Evaluation to simple_report_action, this is important because this will evaluate the accuracy of the text generation of our llm node (simple_report_action).

Like this:

Of course you can assign different evaluators to more llm nodes

3. Set up self-improvement (very interesting part)

1. Run this on your terminal:

npm install -g @handit.ai/cli

2. Run this command and follow the terminal instructions - this connects your repository to Handit for automatic PR creation!:

handit-cli github

✨ What happens next: Every time Handit detects that your AI agent failed, it will automatically send you a PR to your repo with the fixes!

This is like having an AI engineer who never sleeps, constantly monitoring your agent and fixing issues before you even notice them! 🤖👨‍

6. Results

To test the project, first you need to drag and drop some invoices (jpg, png, pdf, jpeg) in the chat interface

First input: create a report based on my invoices, with some charts

Result:

Second input: what is the total sum of all my invoices?

Result:

Whoa! Seems like the LLM made a mistake — the sum of $634.73 + $440.00 + $685.35 + $248.98 is not $1,998.06

Let’s check Observability to figure out what’s going on:

That's right! Handit has detected that the llm made a mistake: The total sum reported in the output is incorrect; it does not accurately reflect the sum of the individual invoices provided. Additionally, the breakdown of the invoice amounts contains errors in the total sum calculation.

But now what? 🤔

What happens when your AI agent is in production and fails silently? You could lose users without even knowing why - that's where the real magic happens.

Handit will automatically fix your AI agent and send you a PR to your repo! ✅

Like this example:

It automatically improves your AI agent, fixing issues so they never happen again:

Now you have an AI agent with self-improvement capabilities.

7. Conclusions

Thanks for reading!

I hope this deep dive into building a self-improving AI agent has been useful for your own projects.

The project is fully open source - feel free to:
🔧 Modify it for your specific needs
🏭 Adapt it to any industry (healthcare, finance, retail, etc.)
🚀 Use it as a foundation for your own AI agents
🤝 Contribute improvements back to the community

Full code open source at: https://github.com/Handit-AI/invoice-copilot.

This project comes with Handit.ai configured. If you want to configure Handit.ai for your own projects, I suggest following the documentation: https://docs.handit.ai/quickstart

What new feature should have this project? Let me know in the comments! 💬

DEV Community

How to build a self-improving agent that updates your UI in real time

What my AI agent actually does (and why it's pretty cool)

The problem I was trying to solve

Table of Contents

1. Architecture Overview

2. Setting Up Your Environment

1. Clone the Repository

2. Create Virtual Environment

3. Install Dependencies

4. Environment Configuration

5. Configure API Keys

6. Run the Application 🚀

1. Clone the Repository

2. Install Dependencies

3. Environment Configuration

4. Run the Application 🚀

3. The Core: MainDecisionAgent 🧠 (the boss)

4. Action Classes: Specialized Tools for Every Task 🎯

EditFileAction - the UI builder

SimpleReportAction - the data analyst

OtherRequestAction - the polite redirector

5. The self-improvement (Best Part)

1. Let's setup Handit.ai observability

2. Set up evaluations

3. Set up self-improvement (very interesting part)

6. Results

7. Conclusions

Top comments (0)