Hello Devs 👋
As developers, we all know how much time goes into repetitive checks:
- Did we add proper error handling?
- Are our functions following naming rules?
- Did we write enough tests?
- Are there hidden logic mistakes?
- Is reliability getting better or worse?
These tasks are important… but doing them manually for every commit or PR is exhausting and eats up time we would rather spend actually building features.
So I asked myself:
🤔 What if I could create an AI agent that does all of this automatically?
And that’s how the Reliability Guardian Agent was born.
In this article, I’ll walk you through building this AI agent step-by-step using Qodo Command, in the simplest way possible. By the end, you’ll have a powerful reliability reviewer that works locally and inside GitHub Actions.
Let’s begin. 🚀
💡 What is the Reliability Guardian?
The Reliability Guardian Agent automatically analyzes your codebase to evaluate and improve:
- Code reliability
- Fault tolerance
- Input validation
- Test coverage
- Error handling
- Historical reliability trends
It uses both static analysis and behavior-style testing (like simulated mutation or fuzz testing) to find:
- Logic inconsistencies
- Weak or missing tests
- Missing input validations
- Unsafe or fragile code paths
- Reliability regressions in recent commits
This agent can be used both locally and in automated CI/CD workflows, And gives you a clean reliability score from 0 to 10, with actionable suggestions.
⚙️ Installing Qodo Command
We’ll use Qodo’s Agentic Quality Workflows CLI to build and run the agent.
Install it globally:
npm install -g @qodo/command
Then log in:
qodo login
Once login is completed you'll receive an API key in the terminal.
The API key is also saved locally in the .qodo folder in your home dir, and can be reused (e.g., in CI).
Creating the Reliability Guardian Agent
1️⃣ Create the agent config
At your project root, create reliable-guardian-agent.toml
This file tells Qodo everything about your agent i.e instructions, arguments, strategy, and output format.
Now, paste the following configuration:
# Reliability Guardian Agent Configuration
version = "1.0"
[commands.reliability_guardian]
description = "Analyze and score project reliability by detecting logic conflicts, missing validations, weak tests, and historical reliability trends."
instructions = """
You are an expert reliability analyst agent. Your purpose is to evaluate the reliability of a software project by analyzing logic consistency, input validation completeness, and test suite robustness.
### Your mission:
1. **Analyze code for logic reliability**
- Detect logical conflicts, contradictory conditions, or redundant branches
- Identify missing input validation or unsafe operations (e.g., divide by zero, null dereference)
- Recognize missing or ineffective exception handling
2. **Evaluate test robustness**
- Perform mutation or fuzz testing to estimate how strong the existing tests are
- Identify functions that lack test coverage or only test “happy paths”
3. **Compute a comprehensive reliability score**
- Logic Consistency (30%)
- Input Validation Coverage (30%)
- Exception Safety (20%)
- Test Effectiveness (20%)
Provide an overall reliability score between 0–10.
4. **Detect reliability trends over time**
- Use Git history to compare reliability results across recent commits or branches
- Highlight improvement or regression in reliability score
5. **Suggest self-healing fixes**
- Suggest specific code improvements such as adding missing validation, refactoring conflicting branches, or adding stronger test cases
- Each fix suggestion should include a short code patch snippet where applicable
"""
arguments = [
{ name = "target_branch", type = "string", required = false, default = "main", description = "Branch to compare against for diff and reliability trend" },
{ name = "max_commits", type = "number", required = false, default = 5, description = "Number of past commits to analyze for historical reliability trends" },
{ name = "mutation_testing", type = "boolean", required = false, default = true, description = "Enable simulated mutation testing" },
{ name = "fuzz_testing", type = "boolean", required = false, default = true, description = "Enable fuzz-style reliability probing" },
{ name = "exclude_files", type = "string", required = false, description = "Comma-separated list of files to exclude (e.g., test mocks or migrations)" }
]
tools = ["qodo_merge", "git", "filesystem"]
execution_strategy = "act"
output_schema = """
{
"type": "object",
"properties": {
"summary": {
"type": "object",
"description": "High-level summary of reliability issues and test robustness",
"properties": {
"files_analyzed": { "type": "number", "description": "Total number of source files analyzed" },
"functions_checked": { "type": "number", "description": "Number of functions analyzed for logic reliability" },
"total_issues": { "type": "number", "description": "Total reliability issues detected" },
"critical_issues": { "type": "number", "description": "Number of critical logic or reliability flaws" },
"reliability_score": {
"type": "object",
"properties": {
"overall": { "type": "number", "minimum": 0, "maximum": 10 },
"logic_consistency": { "type": "number", "minimum": 0, "maximum": 10 },
"validation_coverage": { "type": "number", "minimum": 0, "maximum": 10 },
"exception_safety": { "type": "number", "minimum": 0, "maximum": 10 },
"test_strength": { "type": "number", "minimum": 0, "maximum": 10 }
},
"required": ["overall", "logic_consistency", "validation_coverage", "exception_safety", "test_strength"]
},
"trend": {
"type": "object",
"description": "Reliability trend compared to past commits",
"properties": {
"previous_scores": { "type": "array", "items": { "type": "number" } },
"improvement": { "type": "number", "description": "Positive if reliability improved, negative if regressed" },
"best_commit": { "type": "string", "description": "Commit hash with highest reliability" },
"worst_commit": { "type": "string", "description": "Commit hash with lowest reliability" }
}
}
},
"required": ["files_analyzed", "functions_checked", "total_issues", "reliability_score", "trend"]
},
"issues": {
"type": "array",
"description": "Detailed list of individual reliability issues",
"items": {
"type": "object",
"properties": {
"file": { "type": "string" },
"line": { "type": "number" },
"severity": { "type": "string", "enum": ["critical", "high", "medium", "low"] },
"category": { "type": "string", "description": "logic_conflict | validation_gap | weak_test | exception_risk" },
"description": { "type": "string" },
"suggestion": { "type": "string" },
"code_patch": { "type": "string", "description": "Example of an automated fix or patch suggestion" }
},
"required": ["file", "severity", "category", "description"]
}
},
"suggestions": {
"type": "array",
"description": "High-level reliability improvement recommendations",
"items": {
"type": "object",
"properties": {
"area": { "type": "string", "description": "validation | error_handling | logic | testing" },
"description": { "type": "string" },
"example_patch": { "type": "string" }
},
"required": ["area", "description"]
}
},
"approved": { "type": "boolean", "description": "Whether project meets reliability standards" },
"requires_changes": { "type": "boolean", "description": "True if reliability score < 7.0 or critical issues found" }
},
"required": ["summary", "issues", "suggestions", "approved"]
}
"""
exit_expression = "approved"
The fields of the agent file are:
| Field name | Type | Description |
|---|---|---|
description |
string | Description of what your agent does. This field is required when an agent is run with --mcp. |
instructions |
string | Required field. Prompt for the AI models explailing the required behavior. |
arguments |
list of objects. Supported types: 'string' | 'number' | 'boolean' | 'array' | 'object'
|
List of possible arguments that can be given to the agent. The arguments will be translated and forwarded to MCP servers. |
mcpServers |
string | List of MCP servers used by the agent |
tools |
list | List of MCP server names. Allows you to filter specific MCP servers that can be used by your agent |
execution_strategy |
"act" or "plan" | Plan lets the agent think through a multi-step strategy, act executes actions immediately |
output_schema |
string | Valid json of the wanted agent output |
exit_expression |
string (JSONPath) | Only applicable when output_schema is given.For CI runs, a condition used to determine if the agent run succeeded or failed. |
Our agent accepts the following parameters:
| Parameter | Type | Required | Default | Description |
|---|---|---|---|---|
target_branch |
string | No | master |
Branch to compare for reliability diff and trend analysis |
max_commits |
number | No | 5 |
Number of recent commits to analyze for reliability trends |
mutation_testing |
boolean | No | true |
Enable simulated mutation testing |
fuzz_testing |
boolean | No | true |
Enable simulated fuzz reliability probing |
exclude_files |
string | No | - | Comma-separated list of files to exclude (e.g. mocks, generated code) |
2️⃣ Run the agent locally
You can run this agent by passing optional arguments
qodo reliability_guardian
With Advanced Configuration
# Compare with another branch
qodo reliability_guardian --target_branch=develop
# Analyze last 10 commits for reliability trend
qodo reliability_guardian --max_commits=10
# Run without mutation or fuzz simulation
qodo reliability_guardian --mutation_testing=false --fuzz_testing=false
The tool then analyzes your codebase and returns structured JSON output.
3️⃣ Output Format
The agent returns structured JSON output:
{
"summary": {
"files_analyzed": 4,
"functions_checked": 8,
"total_issues": 18,
"critical_issues": 3,
"reliability_score": {
"overall": 3.5,
"logic_consistency": 4.0,
"validation_coverage": 3.0,
"exception_safety": 4.0,
"test_strength": 3.0
},
"trend": {
"previous_scores": [2.2],
"improvement": 1.3,
"best_commit": "065f7c9",
"worst_commit": "be1abae"
}
},
"issues": [
{
"file": "src/payment.py",
"line": 1,
"severity": "critical",
"category": "logic_conflict",
"description": "Premium users get worse discount (15%) when amount > 100 compared to base premium discount (20%). This is a business logic contradiction.",
"suggestion": "Invert the logic so higher amounts get better discounts (e.g., 25% for amount > 100, 20% otherwise)",
"code_patch": "if user_type == 'premium':\n discount = 0.25 if amount > 100 else 0.20"
},
{
"file": "src/calculator.py",
"line": 10,
"severity": "critical",
"category": "exception_risk",
"description": "average() function will raise ZeroDivisionError when passed an empty list",
"suggestion": "Add validation to check for empty input before division",
"code_patch": "if values is None or len(values) == 0:\n raise ValueError('values must be a non-empty sequence')"
},
{
"file": "src/utils.py",
"line": 1,
"severity": "critical",
"category": "exception_risk",
"description": "safe_get() catches all exceptions with 'except Exception', masking programming errors and making debugging difficult",
"suggestion": "Only catch specific exceptions (KeyError, TypeError) to avoid hiding unrelated bugs",
"code_patch": "except (KeyError, TypeError):\n return default"
},
{
"file": "src/auth.py",
"line": 5,
"severity": "high",
"category": "validation_gap",
"description": "authenticate_user() accepts any types without validation; no None/empty checks",
"suggestion": "Add type and empty string validation before authentication logic",
"code_patch": "if not isinstance(username, str) or not isinstance(password, str):\n return False\nif not username or not password:\n return False"
},
{
"file": "src/payment.py",
"line": 1,
"severity": "high",
"category": "validation_gap",
"description": "calculate_discount() lacks input validation for user_type domain and amount (negative values, type checking)",
"suggestion": "Add validation for user_type and amount before processing",
"code_patch": "if not isinstance(amount, (int, float)) or amount < 0:\n raise ValueError('amount must be a non-negative number')\nif user_type not in ('premium', 'basic'):\n raise ValueError(f'invalid user_type: {user_type}')"
},
{
"file": "src/calculator.py",
"line": 10,
"severity": "high",
"category": "validation_gap",
"description": "average() lacks validation for non-numeric elements in the list",
"suggestion": "Add type checking for all elements before processing",
"code_patch": "if not all(isinstance(x, (int, float)) for x in values):\n raise TypeError('all values must be numeric')"
},
{
"file": "src/calculator.py",
"line": 15,
"severity": "medium",
"category": "validation_gap",
"description": "add_safe() has misleading name suggesting validation, but performs no type enforcement; will concatenate strings or raise TypeError with None",
"suggestion": "Either add type validation or rename function to reflect actual behavior"
},
{
"file": "tests/test_auth.py",
"line": 6,
"severity": "high",
"category": "weak_test",
"description": "test_auth_admin expects authenticate_user('admin','123') to return True, but actual implementation requires password 'secret'. Test is failing.",
"suggestion": "Fix test to match actual implementation or fix implementation to match test contract",
"code_patch": "def test_auth_success():\n assert authenticate_user('admin', 'secret') is True"
},
{
"file": "tests/test_auth.py",
"line": 1,
"severity": "high",
"category": "weak_test",
"description": "test_email_valid only tests happy path; missing tests for invalid emails, None, empty strings",
"suggestion": "Add negative test cases for malformed emails",
"code_patch": "def test_email_invalid_cases():\n assert not validate_email('')\n assert not validate_email('invalid')\n assert not validate_email('a@b.')\n assert not validate_email('@example.com')"
}
],
"suggestions": [
{
"area": "logic",
"description": "Fix payment discount logic contradiction where premium users get worse discount for higher amounts. Invert the condition so amount > 100 gets 25% discount instead of 15%.",
"example_patch": "if user_type == 'premium':\n discount = 0.25 if amount > 100 else 0.20\nelse:\n discount = 0.10"
},
{
"area": "validation",
"description": "Add comprehensive input validation across all functions: type checking, None checks, empty collection checks, domain validation for enums, and range validation for numeric inputs.",
"example_patch": "if not isinstance(amount, (int, float)) or amount < 0:\n raise ValueError('amount must be a non-negative number')\nif user_type not in ('premium', 'basic'):\n raise ValueError(f'invalid user_type: {user_type}')"
},
{
"area": "error_handling",
"description": "Replace broad 'except Exception' clauses with specific exception types to avoid masking programming errors. Only catch expected exceptions like KeyError, TypeError, ValueError.",
"example_patch": "try:\n return d[key]\nexcept (KeyError, TypeError):\n return default"
},
{
"area": "validation",
"description": "Strengthen email validation to reject malformed patterns like 'a@b.', '@example.com', 'a@@b.com'. Implement proper parsing with split and validation of local/domain parts.",
"example_patch": "local, domain = email.split('@', 1)\nif not local or '.' not in domain:\n return False\nlabel, tld = domain.rsplit('.', 1)\nreturn bool(label) and len(tld) >= 2"
},
{
"area": "testing",
"description": "Expand test coverage to include edge cases and negative tests: empty inputs, None values, type errors, boundary conditions, and invalid domain values. Fix failing test in test_auth_admin.",
"example_patch": "def test_auth_failures():\n assert authenticate_user('admin', 'wrong') is False\n assert authenticate_user('', 'secret') is False\n assert authenticate_user(None, 'secret') is False"
},
{
"area": "testing",
"description": "Implement mutation testing to measure test effectiveness. Current tests likely have weak mutation kill rate due to minimal assertions and lack of negative tests.",
"example_patch": "# Run mutation testing with mutmut:\n# mutmut run --paths-to-mutate=src/\n# Expected improvement: mutation score from ~30% to >80% after adding edge case tests"
}
],
"approved": false,
"requires_changes": true
}
This gives a fast, automated overview of what needs to be fixed.
🤖 Add Reliability Guardian to GitHub Actions
Now, The most common way to use this agent is through GitHub Actions to automatically review all your pull requests(PRs).
Let's create a github-action file for it.
GitHub Actions
name: Reliability Guardian Agent
on:
pull_request:
branches: [main, develop]
jobs:
reliability-guardian:
runs-on: ubuntu-latest
permissions:
contents: read
pull-requests: write
checks: write
steps:
- uses: actions/checkout@v4
with:
fetch-depth: 0
- name: Run Reliability Guardian Agent
uses: qodo-ai/command@v1
env:
QODO_API_KEY: ${{ secrets.QODO_API_KEY }}
GITHUB_TOKEN: ${{ secrets.GITHUB_TOKEN }}
with:
prompt: reliability_guardian
agent-file: path/to/agent.toml
key-value-pairs: |
target_branch=${{ github.base_ref }}
max_commits=5
mutation_testing=true
fuzz_testing=true
Now every PR gets an automatic reliability review.
- No manual review overhead.
- No missed edge cases.
- No surprise runtime failures.
🎯 Why This Agent Saves Developers Massive Time
✅ Faster reviews: No more waiting on teammates for basic reliability checks.
✅ More consistent code: Same rules applied to every PR.
✅ More secure and stable builds: Many reliability issues are caught before merging.
✅ Developer time saved: Developers focus on building features, not repetitive reviewing.
✅ Customizable for any project: You can tune the rules, weights, and checks easily.
🎉 Final Thoughts
Qodo’s Agentic Quality Workflow is more than a CLI, it’s a new way of bringing intelligent automation into engineering teams.
The Reliability Guardian Agent is just one example of what you can build.
You can also create:
- Performance auditors
- Security checkers
- Test writers
- Documentation reviewers
- Code refactoring assistants
- And fully custom agents for your team
All using one simple, flexible agent file.
The cool thing is you can build your own agents according to your project specifications.😉
You can visit the agent repository which contains agents implementations examples.
Thank You!!🙏
Thank you for reading this far. If you find this article useful, please like and share this article. Someone could find it useful too.💖

Top comments (0)