What my AI agent actually does (and why it's pretty cool)
Invoice Copilot: Talk to your invoices. This self-improving AI agent answers questions and creates visual reports in real-time
So I built this thing called Invoice Copilot, and honestly, it's doing some pretty wild stuff. Let me break down what's actually happening under the hood.
The problem I was trying to solve
You know how most AI agents are pretty static? They follow some instructions, give you an output, and if an error occurs nothing happens until the engineers modify something manually. My agent has super powers - it actually improves itself! Literally, it has autonomous fixes.
Like, imagine you have a bunch of invoices, you upload them and you ask the AI: "Hey, show me a bar chart of my monthly expenses." Most AI tools would just give you the chart and if there is an error about the sum for example, nothing happens. Mine actually creates the React component with the chart, updates your UI in real-time, and if the LLM drifts, for example the sum was wrong 😱, it actually detects the issue and fixes itself so it never happens again.
Full code open source at: https://github.com/Handit-AI/invoice-copilot
Let's dive in!
Table of Contents
- What my AI agent actually does (and why it's pretty cool)
- The problem I was trying to solve
- 1. Architecture Overview
- 2. Setting Up Your Environment
- 3. The Core: MainDecisionAgent 🧠 (the boss)
- 4. Action Classes: Specialized Tools for Every Task 🎯
- 5. The self-improvement (Best Part)
- 6. Results
- 7. Conclusions
1. Architecture Overview
Let's understand the architecture of our Invoice Copilot:
This architecture separates concerns into distinct nodes:
MainDecisionAgent: This node is responsible for analyzing user requests and decide which actions to take based on the request type, current context, and execution history.
Update File Agent: This is the most sophisticated action class, responsible for generating complete React components with professional charts and visualizations using the Recharts library.
Load Invoice Data: This function scans the processed/ directory for JSON files containing invoice data that has been processed by Chunkr AI.
Simple Report Action: This action is used when users ask for specific information about the data that doesn't require charts or complex visualizations. It provides direct answers to questions.
Other Request Action: This action is used when users make requests that are outside the scope of business reporting or data visualization.
Finish Action - Exit Loop: If the MainDecisionAgent believes no more actions are needed, then proceeds to return a finish
Format Response: This method takes the complete execution history and generates a professional response that explains what was accomplished, addresses the user's original request, and provides helpful next steps.
2. Setting Up Your Environment
Backend
1. Clone the Repository
git clone https://github.com/Handit-AI/invoice-copilot.git
cd invoice-copilot/backend
2. Create Virtual Environment
# Create virtual environment
python -m venv .venv
# Activate virtual environment
# On macOS/Linux:
source .venv/bin/activate
# On Windows:
.venv\Scripts\activate
3. Install Dependencies
# Install dependencies
pip install -r requirements.txt
4. Environment Configuration
# Copy environment example
cp env.example .env
5. Configure API Keys
Edit the .env
file and add your API keys:
# Required API Keys
# Get your API key from: https://platform.openai.com/api-keys
OPENAI_API_KEY=your_openai_api_key_here
# Get your API key from: https://www.handit.ai/
HANDIT_API_KEY=your_handit_api_key_here
# Get your API key from: https://chunkr.ai/
CHUNKR_API_KEY=your_chunkr_api_key_here
# Optional Configuration
OPENAI_MODEL=gpt-4o-mini-2024-07-18
6. Run the Application 🚀
Development Mode
# Make sure virtual environment is activated
source .venv/bin/activate # macOS/Linux
# or
.venv\Scripts\activate # Windows
# Start the FastAPI server
python main.py
The server will start on http://localhost:8000
Frontend
1. Clone the Repository
#If you have cloned before, then skip this
git clone https://github.com/Handit-AI/invoice-copilot.git
#Go to the frontend folder
cd invoice-copilot/frontend
2. Install Dependencies
# Using npm
npm install
# Using yarn
yarn install
3. Environment Configuration
# Copy environment example
cp .env.example .env
4. Run the Application 🚀
Development Mode
# Start development server
npm run dev
# Or with yarn
yarn dev
The application will start on http://localhost:3000
3. The Core: MainDecisionAgent 🧠 (the boss)
Think of it as the brain that decides what to do next. Here's what happens:
- You ask for something - like "create a pie chart of expenses by category"
- The agent thinks about it - it looks at your request and decides what tool to use
- It picks the right action - maybe it needs to edit a file, or maybe just give you a simple answer
- Then assigns the task to the correct node - For example, if it is something that requires coding a graph then, it actually creates the React component of the graph you asked for
- It tells you what it did - gives you a nice summary
Here is a summary of the prompt and the main function:
def analyze_and_decide(self, user_query: str, execution_id: str,
history: List[Dict[str, Any]], working_dir: str = "") -> Dict[str, Any]:
# It looks at what you asked for
# It checks what it's done before (so it doesn't repeat itself)
# It decides what tool to use next
system_prompt = f"""You are a professional report and data visualization specialist.
Given the following request, decide which tool to use from the available options.
Available tools/actions:
1. edit_file: Create or edit professional reports with data visualizations, if graphs are needed to complete the user request
- Parameters: target_file, instructions, chart_description
- Example:
tool: edit_file
reason: I need to create a professional report...
[...more info...]
2. simple_report: ONLY If the user wants to know especific information about the data, the user resquest is simple and graphs are not needed to complete the user request
[...more info...]
3. other_request: If the user request is not related to the report, graphs, statistics data, you can use this tool to do other requests.
[...more info...]
4. finish: Complete the task and provide final response
- No parameters required
- Example:
tool: finish
reason: I have successfully completed all the user request.
params: {{}}
Respond with a YAML object
If you believe no more actions are needed, use "finish" as the tool and explain why in the reason.
"""
# It asks the LLM what to do
response = call_llm(system_prompt, user_query)
# It gets back something like:
# tool: edit_file
# reason: User wants a visualization, so I need to create a chart
# params: target_file: DynamicWorkspace.tsx, instructions: "create pie chart"
4. Action Classes: Specialized Tools for Every Task 🎯
EditFileAction - the UI builder
This is where the magic happens. When you ask for a chart, this thing:
- Loads your real invoice data - it reads the JSON files you've uploaded
- Figures out what charts to make - it analyzes your data and decides on the best visualizations
- Generates React code - it actually writes the React component with Recharts
- Updates your UI - it replaces the file and your dashboard updates instantly
def execute(self, params: Dict[str, Any], working_dir: str = "",
execution_id: str = None) -> Dict[str, Any]:
# Load real invoice data
invoice_data = load_invoice_data()
# Tell the LLM to create a React component
system_prompt = f"""
Create a COMPLETE professional business report React component.
MANDATORY REQUIREMENTS:
- Use ONLY Recharts library for ALL charts
- Extract REAL values from the provided invoice JSON data
- Calculate actual metrics from the invoice data
- NO sample/fake data whatsoever
"""
# The LLM generates the React code
response = call_llm(system_prompt, user_prompt)
# We replace the entire file with the new component
success, message = overwrite_entire_file(full_path, new_react_code)
SimpleReportAction - the data analyst
For simple questions like "What's my total revenue?", it just analyzes the data and gives you a straight answer:
def execute(self, params: Dict[str, Any], working_dir: str = "",
execution_id: str = None) -> Dict[str, Any]:
# Load the invoice data
invoice_data = load_invoice_data()
# Ask the LLM to analyze it
system_prompt = f"""
Answer the user's request based on the provided data.
GUIDELINES:
- For math operations, be super accurate and precise
- Focus on providing specific information requested
- Use only the real data processed
"""
response = call_llm(system_prompt, user_prompt)
return {
"success": True,
"response": response,
"request_type": "simple_report"
}
OtherRequestAction - the polite redirector
If you ask it something completely unrelated like "Can you write me a poem?", it politely redirects you back to what it's good at:
def execute(self, params: Dict[str, Any], working_dir: str = "",
execution_id: str = None) -> Dict[str, Any]:
system_prompt = f"""
The user has made a request that's not related to reports or data.
Please respond politely and explain that you specialize in:
- Creating reports and data visualizations
- Analyzing invoice and financial data
- Generating charts and graphs with business insights
"""
response = call_llm(system_prompt, user_request)
return {
"success": True,
"response": response,
"request_type": "other_request"
}
[...Other Tools...]
Want to dive deep into the tools/actions and prompts? Check out the full open-source code at: https://github.com/Handit-AI/invoice-copilot
5. The self-improvement (Best Part)
Here's the really cool thing - this AI agent actually gets better over time. Here is the secret weapon Handit.ai
Every action, every response is fully observed and analyzed. The system can see:
- Which decisions worked well
- Which ones failed
- How long things take
- What users actually want
- When the LLM makes mistakes
- And more...
And yes sir! When this powerful tool detects any mistakes it fixes automatically.
This means the AI agent can actually improve itself. If the LLM calculates the wrong sum or generates incorrect answers, Handit.ai tracks that failure and automatically adjusts the AI agent to prevent the same mistake from happening again. It's like having an AI engineer who is constantly monitoring, evaluating and improving your AI agent.
To get self-improvement we need to accomplish these steps:
1. Let's setup Handit.ai observability
This will give us full tracing to see inside our llm's and tools to understand what they’re doing.
Notice this project comes configured with Handit.ai Observability, you'll only need to get your own API token. Follow these steps:
1. Create an account here: Handit.ai
2. After creating your account, get your token here: Handit.ai token
3. Copy your token and add it to your .env file:
HANDIT_API_KEY=your_handit_token_here
Once you have accomplish this step, every time you interact with the conversational agent, you will get full observability in the Handit.ai Tracing Dashboard
Like this:
2. Set up evaluations
1. Add your AI provider (OpenAI, GoogleAI, etc.) token here: Token for evaluation
2. Assign evaluators to llm's nodes here: (Handit.ai Evaluation) [https://dashboard.handit.ai/evaluation-hub]
For this project specially assign Correctness Evaluation to simple_report_action, this is important because this will evaluate the accuracy of the text generation of our llm node (simple_report_action).
Like this:
Of course you can assign different evaluators to more llm nodes
3. Set up self-improvement (very interesting part)
1. Run this on your terminal:
npm install -g @handit.ai/cli
2. Run this command and follow the terminal instructions - this connects your repository to Handit for automatic PR creation!:
handit-cli github
✨ What happens next: Every time Handit detects that your AI agent failed, it will automatically send you a PR to your repo with the fixes!
This is like having an AI engineer who never sleeps, constantly monitoring your agent and fixing issues before you even notice them! 🤖👨
6. Results
To test the project, first you need to drag and drop some invoices (jpg, png, pdf, jpeg) in the chat interface
First input: create a report based on my invoices, with some charts
Result:
Second input: what is the total sum of all my invoices?
Result:
Whoa! Seems like the LLM made a mistake — the sum of $634.73 + $440.00 + $685.35 + $248.98 is not $1,998.06
Let’s check Observability to figure out what’s going on:
That's right! Handit has detected that the llm made a mistake: The total sum reported in the output is incorrect; it does not accurately reflect the sum of the individual invoices provided. Additionally, the breakdown of the invoice amounts contains errors in the total sum calculation.
But now what? 🤔
What happens when your AI agent is in production and fails silently? You could lose users without even knowing why - that's where the real magic happens.
Handit will automatically fix your AI agent and send you a PR to your repo! ✅
Like this example:
It automatically improves your AI agent, fixing issues so they never happen again:
Now you have an AI agent with self-improvement capabilities.
7. Conclusions
Thanks for reading!
I hope this deep dive into building a self-improving AI agent has been useful for your own projects.
The project is fully open source - feel free to:
🔧 Modify it for your specific needs
🏭 Adapt it to any industry (healthcare, finance, retail, etc.)
🚀 Use it as a foundation for your own AI agents
🤝 Contribute improvements back to the community
Full code open source at: https://github.com/Handit-AI/invoice-copilot.
This project comes with Handit.ai configured. If you want to configure Handit.ai for your own projects, I suggest following the documentation: https://docs.handit.ai/quickstart
What new feature should have this project? Let me know in the comments! 💬
Top comments (0)