nagasatish chilakamarti

Posted on Apr 19

How to Add Governance to AI Pentesting Agents

#ai #security #agents #aigovernance

Autonomous AI agents are now running nmap, gobuster, and nikto. Here's how to make sure they don't go rogue.

The Rise of AI Pentesting Agents

AI-directed penetration testing is here. Projects like Talon by CarbeneAI give Claude Code secure SSH access to Kali Linux — you describe what you want to test in plain English, and the AI runs the tools, interprets output, and suggests next steps.

This is powerful. It's also exactly the kind of autonomous agent behavior that needs governance.

When an AI agent can execute nmap -sV -sC target, parse the results, pivot to gobuster for directory enumeration, and then run nikto for vulnerability scanning — all without human intervention — you need answers to some hard questions:

What tools is the AI allowed to run? (Should it be able to run rm -rf or dd?)
What happens when it finds credentials? (Are they logged? Stored? Redacted?)
How do you audit what the AI did? (Can you produce a SARIF report for compliance?)
What if the AI enters a retry loop? (Hammering a target with 10,000 requests?)
How do you verify the AI stayed in scope? (Only testing authorized targets?)

These aren't hypothetical concerns. They're the OWASP Top 10 for Agentic Applications in action.

Enter Governance: TealTiger + AI Pentesting

TealTiger is an open-source AI agent security SDK that provides governance, guardrails, and evidence for LLM applications. Its v1.2 governance bundle introduces 7 modules across 6 governance dimensions — and they map directly to pentesting agent risks.

1. Tool Allowlisting with TealRegistry

The first rule of AI pentesting: the agent should only run tools you've explicitly approved.

import { TealRegistry } from 'tealtiger/registry';

const registry = new TealRegistry({
  catalogs: {
    tools: {
      entries: [
        { id: 'nmap', version: '7.94', catalog: 'tools' },
        { id: 'gobuster', version: '3.6', catalog: 'tools' },
        { id: 'nikto', version: '2.5.0', catalog: 'tools' },
        // rm, dd, wget — NOT listed = DENIED
      ]
    }
  }
});

// When the AI tries to run a tool:
const decision = await registry.evaluate({
  content: 'rm -rf /tmp/loot',
  metadata: { tool_id: 'rm' }
}, ctx, policy);

// decision.action === 'DENY'
// decision.reason_codes === ['TOOL_NOT_ALLOWLISTED']

If the AI tries to execute a tool not in the allowlist, TealRegistry returns DENY with the TEEC reason code TOOL_NOT_ALLOWLISTED. No ambiguity. No silent failures.

2. Credential Detection with TealSecrets

Pentesting agents find credentials. That's the point. But those credentials shouldn't leak into logs, chat history, or unredacted reports.

import { TealSecrets } from 'tealtiger/secrets';

const secrets = new TealSecrets();

// AI finds an SSH key during enumeration
const findings = await secrets.scan(scanOutput, ctx);

// findings[0].type === 'ssh_private_key'
// findings[0].category === 'infrastructure'
// findings[0].confidence === 0.97
// findings[0].severity === 'CRITICAL'

// Policy enforcement: REDACT the key from evidence
const decision = await secrets.evaluate(request, ctx, policy);
// decision.action === 'REDACT'
// decision.reason_codes === ['SECRET_DETECTED']

TealSecrets detects 500+ secret patterns across 9 categories. The confidence scorer uses Shannon entropy, structural matching, and context proximity to minimize false positives. Raw secret values never appear in evidence by default.

3. Retry Budget Enforcement with TealReliability

An AI pentesting agent that enters a retry loop against a target is indistinguishable from a DDoS attack.

import { TealReliability } from 'tealtiger/reliability';

const reliability = new TealReliability({
  retry: {
    maxAttempts: 3,
    budgetMs: 10000,  // 10 second total budget
    transientCodes: [429, 500, 502, 503]
  },
  circuitBreaker: {
    failureThreshold: 5,
    cooldownMs: 30000  // 30 second cooldown
  }
});

// If the target returns 5 consecutive failures:
// Circuit breaker OPENS → zero retry attempts → CIRCUIT_OPEN emitted
// AI is forced to fallback or stop — no retry storm

The circuit breaker state machine (CLOSED → OPEN → HALF_OPEN → CLOSED) prevents the AI from hammering unresponsive targets. RETRY_BUDGET_EXCEEDED and CIRCUIT_OPEN are TEEC reason codes that appear in the audit trail.

4. Memory Governance with TealMemory

AI pentesting agents maintain engagement notes — what they found, what they tried, what worked. This memory needs governance.

import { TealMemory } from 'tealtiger/memory';

const memory = new TealMemory({
  adapter: localAdapter,
  policy: {
    allowed_scopes: ['SESSION', 'USER'],
    max_ttl_ms: 86400000,  // 24 hours — engagement data expires
    content_scan: true       // Scan writes for secrets/PII
  }
});

// AI tries to store found credentials in memory
const decision = await memory.write({
  scope: 'SESSION',
  classification: 'RESTRICTED',
  value: 'root:toor (found on 10.0.0.5:22)',
  ttl_ms: 86400000
}, ctx);

// decision.action === 'REDACT_AND_WRITE'
// Credential value is hashed before storage
// reason_codes === ['MEMORY_WRITE_REDACTED']

TealMemory enforces scope boundaries, classification clearance, TTL expiration, and content scanning. Raw credentials found during pentesting are redacted before they hit persistent storage.

5. Evidence Export with TealVerify

Every pentest needs a report. TealVerify generates SARIF v2.1.0, JUnit XML, and JSON evidence — ready for compliance review.

import { SARIFExporter } from 'tealtiger/verify';

const exporter = new SARIFExporter({ redactSecrets: true });
const sarif = exporter.export(findings, ctx);

// Upload to GitHub Security UI
// sarif.runs[0].results → each finding with stable rule IDs
// sarif.runs[0].tool.driver.name === 'TealTiger'
// All raw secrets redacted by default

The SARIF output integrates directly with GitHub Security UI. Golden tests verify that your governance policies produce expected decisions. The red-team harness generates adversarial inputs to find policy bypasses before production.

The TEEC Evidence Contract

All of this is unified by TEEC v0.1.0 (TealTiger Event & Evidence Contract) — a formal contract defining 32 reason codes, 18 event types, and 12 decision actions. Every governance decision produces a Decision object with:

action: What happened (ALLOW, DENY, REDACT, DEGRADE, etc.)
reason_codes: Why (TOOL_NOT_ALLOWLISTED, SECRET_DETECTED, CIRCUIT_OPEN, etc.)
correlation_id: Trace ID linking all decisions in a session
teec_version: "0.1.0" — frozen contract for deterministic CI assertions

This means you can write golden tests that assert: "When the AI tries to run rm, the decision MUST be DENY with reason code TOOL_NOT_ALLOWLISTED." And run those tests in CI on every policy change.

OWASP ASI Coverage

TealTiger v1.2 maps directly to the OWASP Top 10 for Agentic Applications:

OWASP ASI	Risk	TealTiger Module
ASI-01	Excessive Agency	TealRegistry (tool allowlisting)
ASI-02	Insufficient Access Control	TealMemory (scope/classification)
ASI-03	Knowledge Poisoning	TealRegistry (provenance verification)
ASI-04	Cascading Hallucination	TealReliability (circuit breaker)
ASI-05	Improper Output Handling	TealSecrets (redaction)
ASI-06	Privilege Escalation	TealRegistry + TealMemory
ASI-07	Denial of Service	TealReliability (retry budget)
ASI-08	Supply Chain Vulnerabilities	TealRegistry (supply chain scoring)
ASI-09	Logging & Monitoring Failures	TealVerify (SARIF/TEEC evidence)
ASI-10	Insecure Plugin Design	TealRegistry (tool governance)

Getting Started

# TypeScript
npm install tealtiger

# Python
pip install tealtiger[full]

TealTiger is MIT licensed, open source, and works with any LLM provider. The governance modules are additive — install only what you need.

GitHub: github.com/agentguard-ai/agentguard-ai/tealtiger
npm: npm install tealtiger
PyPI: pip install tealtiger

TealTiger v1.2(Yet To Launch) introduces the governance bundle — 7 modules, 6 dimensions, 38 controls, unified by the TEEC v0.1.0 evidence contract. Both TypeScript and Python SDKs with identical semantics.

Top comments (1)

IntSpired® • Apr 19

Good thread, especially the focus on tool governance and auditability.
One gap I often see is scope control. Most agent workflows assume everything happens on-network, but real-world exposure doesn’t stop there.
At IntSpired we approach engagements from the adversary’s perspective, which often extends beyond traditional network boundaries.