How to Conduct an AI Agent Security Assessment in 2026 | Day 19

#aiagenthijacking2026 #llmagentassessment #llmagentredteam #multiagentsecurity

📰 Originally published on Securityelites — AI Red Team Education — the canonical, fully-updated version of this article.

🤖 AI/LLM HACKING COURSE

FREE

Part of the AI/LLM Hacking Course — 90 Days

Day 19 of 90 · 21.1% complete

⚠️ Authorised Targets Only: AI agent security assessment — especially tool hijacking confirmation — must only be performed against authorised targets. Use Burp Collaborator or your own controlled endpoints for all out-of-band callback confirmations. Never trigger real-world agent actions (email sends, file modifications, API calls) against production data during testing without explicit agreement from the engagement contact.

The first time I assessed a real, production AI agent — not a demo, a real production system used by 2,000 employees — I spent the first thirty minutes just mapping what it could do. Email. Calendar. File access on the company SharePoint. A read connection to the HR system. Query capability against the customer CRM. The team that built it was proud of it. They should have been — it was impressive. I spent the next two hours demonstrating that any of those 2,000 employees who could get another employee to upload a specific document to the agent would be able to read that second employee’s calendar, emails, and HR record.

The finding wasn’t elegant. The injection payload was six sentences hidden in what appeared to be a standard quarterly report. The impact was complete visibility into the target employee’s work activity — emails, meetings, performance records — without any suspicious action required from either party. The agent was doing exactly what it was built to do. The problem was the gap between what it needed to do its job and what it had been given permission to do. That gap is what Day 19 is built to find systematically.

🎯 What You’ll Master in Day 19

Apply the Day 18 extraction output as the starting point for agent assessment — extracted tools become the attack targets
Build a permission gap matrix comparing granted vs required permissions for each discovered tool
Craft targeted tool hijacking payloads using exact function names from extraction
Execute indirect tool hijacking via document and email injection chains
Test multi-agent trust boundaries and inter-agent injection
Calculate maximum impact and write the complete chain finding for the report

⏱️ Day 19 · 3 exercises · Think Like Hacker + Kali Terminal + Browser ### ✅ Prerequisites - Day 18 — Advanced System Prompt Extraction — the extracted tool list is the input to the Day 19 assessment; completing extraction before starting agent testing saves significant time - Day 10 — LLM06 Excessive Agency — the permission gap analysis and tool hijacking foundations from Day 10 are extended into the full assessment methodology here - Burp Collaborator access — out-of-band confirmation is essential for tool hijacking evidence that doesn’t cause real-world impact ### 📋 AI Agent Security Assessment — Day 19 Contents 1. The Agent Assessment Phases 2. Building the Permission Gap Matrix 3. Targeted Tool Hijacking With Exact Parameters 4. Indirect Injection Chains for Zero-Interaction Exploitation 5. Multi-Agent Trust Boundary Testing 6. Chain Finding Documentation for Maximum Severity In Day 18 you recovered the system prompt and identified what tools the agent has. Day 19 uses that knowledge to run a complete agent security assessment. The extracted tool list is not just reconnaissance — it’s the test plan. Day 20 shifts focus to API-level reconnaissance — finding AI-powered endpoints that aren’t documented and don’t have the access controls their non-AI counterparts do.

The Agent Assessment Phases

Agent assessments have four phases. They run in sequence because each phase informs the next. Skipping phase one — extraction — means running phase two — permission analysis — blind. Skipping phase two means running phase three — tool hijacking — without knowing which tools have the most impact.

Phase one: extract the system prompt using the Day 18 methodology. Get the complete tool list, permission scope, and data access description. Phase two: build the permission gap matrix. What does the agent need vs what does it have? Every excess capability is a target. Phase three: direct tool hijacking. Test each excess tool using targeted payloads that name the exact function and supply valid-looking parameters. Phase four: indirect hijacking. Plant injection in documents and emails that the agent will process naturally, using the direct hijacking payloads as the embedded instruction. The indirect chain produces the Critical finding. The direct chain confirms the tool is hijackable before you invest time in the indirect delivery.

Building the Permission Gap Matrix

The permission gap matrix is a table with one row per discovered tool. Columns: tool name, what it does, whether it’s required for the agent’s stated purpose, and the maximum impact if hijacked. Filling it out before testing determines which tools to prioritise — you’re not going to spend as much time on a calendar read tool as on an email send tool with no recipient restriction.

The “required” assessment is the most important column. Be strict about it. If the agent’s stated purpose is “answer customer service questions about product returns,” it needs read access to the returns policy document. It doesn’t need email send capability, calendar access, or the ability to query other customers’ records. Anything beyond the minimum creates a gap. Document it. Every gap entry in the matrix is a target for the next phase.

📖 Read the complete guide on Securityelites — AI Red Team Education

This article continues with deeper technical detail, screenshots, code samples, and an interactive lab walk-through. Read the full article on Securityelites — AI Red Team Education →

This article was originally written and published by the Securityelites — AI Red Team Education team. For more cybersecurity tutorials, ethical hacking guides, and CTF walk-throughs, visit Securityelites — AI Red Team Education.

Top comments (1)

Truong Bui • May 25

The permission gap matrix framing is something more teams need to internalize. The "required vs granted" distinction is almost always where the real damage happens — not because anyone was careless, but because agents accumulate permissions incrementally and nobody audits the delta.

One thing worth adding to the indirect injection chain phase: MCP servers introduce a new delivery surface. When an agent connects to an MCP server, the tool descriptions themselves can carry injection payloads. The agent reads them at startup, before any user input, and the content is never shown to the human in the loop. We scanned 508 public MCP servers at MCPSafe (mcpsafe.io) and found this pattern in 18% of them — tool descriptions that contain instructions to the agent rather than descriptions for the agent. That's Phase 4 injection that bypasses the document and email delivery chains entirely.

The permission gap matrix approach maps cleanly here too: if a server's tool list includes capabilities far beyond what the stated integration purpose requires, that's a red flag worth flagging before install, not after.