Mohamed AboElKheir

Posted on Jun 8 • Edited on Jun 10

Let "Claude Code" Do Your Pentesting!

#appsec #ai #cybersecurity #claude

Can AI meaningfully assist in security testing, or is it just a hype? In this post, I walk through a concrete example: using Claude Code to run the same IDOR, authentication, and CSRF test cases you'd normally run manually with Burp Suite, and look at what that actually buys you, and where the limits are.

NOTE: This post is also available in video form on my YouTube channel here: https://youtu.be/JTk8brm6Zpc

The Test Case

The application under test is a simple snippet-sharing web app (source code here). The core business rule is straightforward: users can create, edit, and delete their own snippets — but not those belonging to other users. The security test case is equally straightforward: can that rule be bypassed?

This is the classic IDOR (Insecure Direct Object Reference) pattern. The browser enforces this rule at the UI level — there's no edit button for other users' snippets — but the question is whether the backend enforces it too. If I craft a PUT /snippets/:id request with someone else's snippet ID and send it directly, what happens?

We test this manually first using Burp Suite as a proxy, then repeat the exact same steps using Claude Code.

The Manual Baseline

With Burp's built-in browser routing traffic through the proxy, the test flow is:

Log in and create a snippet, then edit it — this produces a PUT /snippets/:id request in Burp's HTTP history, find it and send it to Burp's Repeater.

From the GET /snippets response, grab a snippet ID owned by a different user (is_owner: false)

In the PUT request you send to the Repeater earlier, swap the original snippet id with the one owned by a different user.

The response is 403 Forbidden — authorization is working. The same test on the DELETE endpoint returns 401 Unauthorized. No IDOR.

Two more quick checks while in Repeater:

Authentication: Replace the session cookie with an invalid value → 403 Forbidden. The application correctly rejects unauthenticated requests.

CSRF protection: Replace the CSRF-Token header with an invalid value on the same mutating route → 403 Forbidden. CSRF protection is enforced.

One finding worth noting: the error responses are verbose — they expose the framework name (Express), internal file paths, and library names. Not broken authentication or broken CSRF, but an information disclosure finding that gives an attacker unnecessary context about the stack.

Repeating the Same Tests with Claude Code

The goal here is to see whether Claude Code can follow the same methodology and reach the same conclusions, with less manual effort.

The Integration Architecture

Two MCP servers make this work:

Burp MCP — gives Claude Code access to Burp's HTTP history and tools, including the ability to read captured requests and send crafted HTTP requests directly. Install it from the Burp Extensions store or download from here, then add it to Claude Code:

claude mcp add --transport sse burp http://127.0.0.1:9876/ -s user

Playwright MCP — gives Claude Code a browser it can control programmatically. The critical detail: it needs to be configured to route traffic through Burp on port 8080, so that all browser activity is captured in Burp's history the same way manual testing would be, which can be done by adding the --proxy-server=http://127.0.0.1:8080 and --ignore-https-errors options when adding the mcp as shown in the below command.

claude mcp add playwright -s user -- npx @playwright/mcp@latest --proxy-server=http://127.0.0.1:8080 --ignore-https-errors

--ignore-https-errors is fine in a controlled testing environment. It should never be used in production.

The White-Box Advantage

One detail that matters more than it might seem: open the Claude Code session from the application's source code directory. This gives Claude Code access to both the Burp MCP tools and the application's source code simultaneously. It turns what would otherwise be a black-box test into a white-box one — Claude Code can trace findings back to the specific lines of code responsible, not just report what HTTP status code was returned.

The Prompt

Rather than giving Claude Code a vague goal, we give it the exact same steps we followed manually:

I want to test whether users can delete or update public code snippets of other users using the Burp MCP,  follow these steps:
- Open a playwright browser, login with the credentials in the README file
- create a snippet, then edit it
- get the request from burp history
- get a snippet id owned by a different user
- create a repeater tab with same request but with the snippet id of other user
- also do that using the send http tool and let me know if there is a vulnerability

This is important: the prompt isn't "test my application for IDOR." It's a codified version of the methodology we already validated manually. Claude Code is executing a defined workflow, not exploring freely.

What Claude Code Does

Claude Code reads the README to get credentials, opens the Playwright browser, logs in, creates a snippet, and edits it. At this point, Burp's HTTP history contains all of that traffic — exactly as it would if a human had done the same steps manually.

Claude Code then pulls the PUT request from Burp's history, identifies a snippet ID owned by a different user from the API response, and does two things: creates a Repeater tab in Burp with the modified request (so you can replay it yourself later), and sends it directly using Burp's send HTTP tool.

The result: 403 Forbidden. Same as the manual test.

But the more interesting part is the summary Claude Code produces:

The Playwright browser logged in, created a test snippet, retrieved the PUT request from Burp history, found a snippet ID owned by a different user, created a Repeater tab, and sent the same request using the send HTTP tool. The response was 403 Forbidden — authorization is working correctly.

Looking at the source code, authorization is enforced in [file] at line [line number], where the snippet's owner ID is compared against the authenticated user's ID before allowing the update or delete.

This is the white-box advantage made concrete: not just confirming that a 403 was returned, but pointing you directly to the code responsible for it. In a real engagement, this significantly reduces the time between "found an issue" and "here's where to fix it" — or in this case, "here's confirmation of where the protection is implemented."

Full Claude Code response

Packaging the Methodology as a Skill

Once a methodology is validated, you don't want to re-write the full prompt every time you apply it. Claude Code supports skills — reusable prompt templates you invoke by name. Ask Claude Code to create one from the session:

Create a skill called "test-idor" based on the steps we followed in this session

The next time you want to run the same IDOR test on a different endpoint or a different application, you invoke the skill with the relevant context rather than reconstructing the methodology from scratch. This is where AI-assisted pentesting moves from an interesting demo to a repeatable workflow.

What This Is Good For — and Where It Falls Short

This approach works well when:

You have a defined methodology you want to execute consistently across multiple endpoints or applications
You want the test results tied back to the source code, not just observed at the HTTP layer
You are running regression testing — verifying that a previously tested control still holds after a code change

It is less suited for:

Exploratory testing, where the value comes from a human noticing something unexpected and pulling on that thread
Complex multi-step attack chains, where each step depends on reasoning about the previous result in ways that are hard to specify upfront
Any scenario where the methodology itself is still being developed — you need to know what you're testing before you can ask Claude Code to test it

The manual and automated approaches are complementary, not competing. Manual testing is where you develop and validate the methodology. Claude Code is where you execute and scale it.

Conclusion

The combination of Burp MCP for HTTP interception and replay, Playwright MCP for browser automation, and source code access for context gives Claude Code a genuinely useful set of capabilities for security testing. It is not a replacement for a security engineer's judgment — but it is an effective tool for executing a defined security testing methodology with less manual effort and more traceability back to the code.

The pattern that makes it work is the same pattern that makes any security tooling effective: know what you're testing, define a clear methodology, and use the tool to execute it — not to substitute for thinking about it.

Let me know in the comments if you're using Claude Code for other security test cases, or if you've built skills for test cases beyond IDOR.

NOTE: This analysis is also available on my YouTube channel in video form here: https://youtu.be/JTk8brm6Zpc

DEV Community