AI agents are quickly moving from simple chat interfaces to systems that can use tools, access data, trigger workflows, write messages, and sometimes take actions on behalf of users. That shift is exciting, but it also creates a serious security question:
How do we evaluate the risk of an AI agent before we deploy it?
That question led me to build AgentGuardian, a local-first AI security web app that scans agentic AI workflows for risks such as prompt injection, tool misuse, excessive autonomy, sensitive data exposure, insecure output handling, and lack of human oversight.
The goal was to build a practical prototype using:
- Python
- Streamlit
- Pandas
- Ollama
- A local LLM
- A deterministic rule-based risk scoring engine
No external LLM API key is required.
Why I Built AgentGuardian
As AI agents become more capable, they are increasingly being connected to tools such as:
- Files
- Databases
- CRMs
- Ticketing systems
- Calendars
- Payment systems
- Web browsers
These tools make agents more useful, but they also increase the risk.
For example, an AI assistant that only summarizes public documents has a very different risk profile from an AI agent that reads customer complaints, checks order history, drafts refund responses, and sends emails to customers.
The second agent has access to sensitive data, external inputs, and business-impacting tools. That means it needs a stronger security review before deployment.
I wanted AgentGuardian to help answer questions like:
- What tools can this agent access?
- What type of data does it handle?
- Does it receive untrusted external input?
- Can it take actions automatically?
- Is human approval required?
- What risks should the team fix before deployment?
What AgentGuardian Does
AgentGuardian lets a user describe an AI agent workflow and then generates a security risk review.
The user enters information such as:
- Agent name
- Agent purpose
- Tools the agent can access
- Data types the agent handles
- External inputs the agent receives
- Autonomy level
- Human approval requirements
Then AgentGuardian produces:
- A risk score from 0 to 100
- A risk level: Low, Medium, High, or Critical
- A risk category breakdown
- A detected risks table
- Recommended controls
- A local LLM-generated security summary
- A downloadable Markdown security report
How is does it look like you ask?
The Architecture
AgentGuardian has two main layers.
1. Rule-Based Risk Engine
The rule-based engine is responsible for the actual scoring.
This was an intentional design choice. I did not want the LLM to decide the risk score because LLM outputs can vary. Instead, the deterministic Python engine assigns risk points based on clear conditions.
For example:
- If the agent receives emails, uploaded files, websites, or user messages, prompt injection risk increases.
- If the agent can access email, files, databases, payment systems, or code execution, tool misuse risk increases.
- If the agent handles financial data, health data, credentials, customer records, or student records, sensitive data exposure risk increases.
- If the agent can execute actions automatically, autonomy risk increases.
- If human approval is not required, human oversight risk increases.
This makes the scoring more explainable and consistent.
2. Local LLM Security Summary
The second layer uses Ollama to generate a readable security analysis based on the rule-based findings.
The LLM does not determine the score. It explains the score.
This keeps the project local-first while still making the final output useful for developers, security teams, and non-technical stakeholders.
Why Local-First?
I wanted this project to avoid external API dependencies.
Using Ollama allows the app to run a local model such as:
ollama pull llama3.2
or:
ollama pull llama3.1:8b
Then the Streamlit app can call the local model to generate the security summary.
This is useful for a security-focused tool because many AI agent workflows may involve sensitive business logic, internal data, or private use cases. A local-first approach reduces dependency on external LLM APIs and makes the prototype easier to run in controlled environments.
Building the Streamlit Interface
The web app is built with Streamlit.
The main interface includes three tabs:
- Agent Workflow Scanner
- Risk Knowledge Base
- Sample Scenarios
The scanner tab collects the agent profile. The knowledge base explains common risks such as prompt injection, tool misuse, sensitive data exposure, excessive autonomy, and insecure output handling. The sample scenarios help users test the tool with realistic agent workflows.
One important usability improvement was adding validation so the app does not generate a report when the user submits an empty form.
That small check makes the app feel less like a rough prototype and more like a usable security tool.
Example High-Risk Scenario
One sample scenario is an invoice payment agent:
Agent Name:
Invoice Payment Agent
Purpose:
Reads invoices, verifies vendor records, and automatically approves payments under $5,000.
Tools:
Email, Files, Database, Payment system
Data Types:
Financial data, Customer records, Credentials or secrets
External Inputs:
Emails, Uploaded files, API responses
Autonomy Level:
Executes automatically
Human Approval:
Not required
This workflow creates several risks:
- Prompt injection through emails or uploaded invoices
- Sensitive data exposure through financial records and credentials
- Tool misuse through access to payment systems
- Excessive autonomy because the agent can execute actions automatically
- Lack of human oversight because approval is not required
AgentGuardian classifies this as a high or critical risk scenario and recommends safeguards such as human approval, least privilege, logging, input validation, and output review.
Downloadable Security Report
I also added a Markdown report generator.
The report includes:
- Agent profile
- Risk score
- Risk level
- Risk category breakdown
- Detected risks
- Recommended controls
- Local LLM-generated analysis
- Disclaimer
This makes the app more useful because users can save or share the security review.
Lessons Learned
A few design decisions stood out while building AgentGuardian.
1. The LLM should explain, not decide
For this type of security tool, I wanted scoring to be deterministic. The LLM is helpful for summarization, but the core risk logic should be transparent and repeatable.
2. Agent security depends on combinations of risk factors
A tool is not risky by itself. A data type is not risky by itself. Autonomy is not risky by itself.
The risk increases when they combine.
For example:
External input + sensitive data + high-impact tools + automatic execution = dangerous workflow
That combination-based thinking became the foundation of the scoring engine.
3. Local-first AI is very useful for security prototypes
Ollama made it easy to add an LLM layer without relying on external API calls. That is especially helpful for security-focused use cases where data sensitivity matters.
4. A simple prototype can still communicate a strong idea
AgentGuardian is not a full enterprise security product, but it demonstrates a useful pattern:
structured input → explainable risk scoring → local LLM analysis → downloadable report
That pattern could be expanded into a larger AI governance or AI security review workflow.
Future Improvements
Some possible next steps include:
- Adding clickable sample scenarios using Streamlit session state
- Mapping risks more explicitly to OWASP LLM and Agentic AI categories
- Exporting reports as PDF
- Adding Docker support
- Comparing multiple agent workflows side by side
- Adding configurable risk weights
- Adding enterprise policy recommendations
- Adding more local model options
- Adding a threat modeling mode
Final Thoughts
Agentic AI systems are powerful because they can act. But that also means they need security review before deployment.
AgentGuardian is a small step toward making that review process more practical, explainable, and accessible.
The project combines a rule-based security engine with a local LLM layer to help developers and security teams think through agent risks before those risks become production problems.
GitHub repo: https://github.com/zosob/AgentGuardian.git

Top comments (0)