Nex Tools

Posted on May 10 • Originally published at nextools.hashnode.dev

Claude Code for Security Audits: How I Catch Vulnerabilities Before They Cost Me

#security #ai #devops #tutorial

Three years ago a junior engineer on a team I was advising committed an environment file to a public GitHub repository. The file contained an AWS access key with admin permissions on a production account. The key was harvested by an automated scanner within four minutes of the commit. By the time the team noticed, an attacker had spun up 200 EC2 instances mining cryptocurrency. The bill for those four hours was $14,000.

The team had a security checklist. The checklist included a line that said "do not commit secrets to git." The line had been on the checklist for two years. It had been read by every engineer on the team. None of that mattered, because security checklists do not run themselves, and the moment of committing a file is exactly the moment when nobody has the bandwidth to consult a checklist.

I started using Claude Code for security audits because I wanted the checklist to run itself. Not as a replacement for human review, but as the first pass that catches the obvious mistakes before they reach a human reviewer or, worse, production. Here is the workflow that has caught real vulnerabilities in real codebases.

Why Security Audits Get Skipped

Most teams have security checklists. Most teams do not run them consistently. The reason is not that engineers do not care about security. The reason is that security audits feel like a tax that gets paid out of the same time budget as shipping features, and the visible reward for shipping a feature is much higher than the visible reward for catching a vulnerability that would not have been exploited for another six months.

This math is wrong, but it feels right in the moment. The cost of a missed vulnerability is theoretical and deferred. The cost of pausing to audit is concrete and immediate. So the audit gets skipped, and the vulnerability accumulates, and six months later somebody pays the deferred cost in cash and reputation.

The second reason security audits get skipped is that they are tedious. A real audit means reading every line of new code with a paranoid mindset. It means thinking about what an attacker could do with each input, each query, each file path. It means imagining failure modes that have not happened yet. This is exhausting work, and humans are bad at sustaining it for long stretches.

A security audit is the highest-leverage hour you can spend on a codebase, and it is also the hour engineers are least motivated to spend, because the work is invisible when it succeeds and only visible when it fails.

Claude Code does not get tired. Claude Code does not get bored. Claude Code can read every line of a diff with the same paranoid mindset on the hundredth file as on the first. That is exactly the kind of work where automation pays off.

The Pre-Commit Audit Skill

The first skill I built is a pre-commit audit. It runs on the staged diff before I commit and flags anything that looks like a security risk. The skill has a list of patterns it looks for and a list of file types it pays extra attention to.

The patterns it looks for include hardcoded credentials of any kind, calls to dangerous functions like eval and exec with user input, SQL queries built by string concatenation, file paths constructed from user input without validation, deserialization of untrusted data, and authentication checks that are missing, bypassable, or applied inconsistently.

The file types it pays extra attention to include environment files, configuration files, anything that looks like it might contain credentials, anything in an authentication or authorization module, and anything that handles user uploads.

When the skill flags something, it explains what the risk is and what the fix would look like. It does not block the commit. It just tells me what it found, and I decide whether to address the issue or proceed. Most of the time the issue is real and worth fixing. Sometimes the issue is a false positive, and I commit anyway. The skill is calibrated to err on the side of flagging too much rather than too little, because a false positive costs me 30 seconds and a missed vulnerability could cost me $14,000.

The skill caught a hardcoded API key in a test file last month. The test file was meant to use a mocked credential, but somebody had pasted a real key into the test while debugging and forgotten to remove it. The commit would have gone to a public repository. The skill flagged it before I pushed, and I cleaned it up.

The Dependency Audit Skill

The second skill audits dependencies. Modern applications include hundreds or thousands of transitive dependencies, and any of them could be compromised. The dependency audit skill cross-references my package manifest against published vulnerability databases and flags packages with known issues.

This is not a novel idea. Tools like npm audit and pip-audit do something similar. What the Claude Code version adds is context. When npm audit tells me there is a high-severity vulnerability in a transitive dependency, I have to figure out whether the vulnerable code path is actually reachable from my code, whether the fix requires a major version bump that will break things, and whether the risk is actually material to my application or just theoretical.

Claude Code reads the vulnerability description, looks at how the dependency is used in my code, and gives me an honest assessment. Sometimes the answer is "this is a real risk, fix it now." Sometimes the answer is "this vulnerability requires the attacker to control the input to a function you do not call, so it is not exploitable in your application." Sometimes the answer is "this vulnerability is real and exploitable, but the fix requires upgrading three other packages first, so you should plan a separate sprint."

The contextual assessment is the part that matters. A list of vulnerabilities is overwhelming. A prioritized list of vulnerabilities with reasoning attached is actionable.

The Authentication Flow Skill

The third skill audits authentication and authorization flows. This is the highest-stakes area of most applications and the area where mistakes are most likely to happen, because authentication code looks similar across applications and engineers tend to copy patterns from previous projects without checking whether the patterns still apply.

The authentication audit skill looks at every endpoint and asks: who is allowed to call this endpoint, and how is that enforced? It traces the authentication middleware, looks at the authorization checks, and verifies that the checks are present, correct, and not bypassable.

Common issues the skill catches include endpoints that are missing authorization checks entirely, endpoints where the authorization check uses the wrong identifier, endpoints where the authorization check happens after a side effect has already occurred, endpoints where the authorization logic is correct in one place but wrong in another, and endpoints where the authorization can be bypassed by malformed input.

I run this skill against every authentication-related PR. It has caught issues that would have shipped to production in two of the last twelve PRs. Both issues were the result of an engineer copying a pattern from a different endpoint without realizing that the new endpoint had different authorization requirements. Both would have been hard to catch in code review because the code looked correct.

The Secrets Scan Skill

The fourth skill scans the entire repository for secrets. This is more aggressive than the pre-commit audit, which only looks at the staged diff. The secrets scan looks at every file, every commit in the history, and every branch.

The skill looks for high-entropy strings that match known credential patterns, environment files that have been committed even if they are now gitignored, hardcoded passwords in test data, API keys in documentation examples, and credentials embedded in deployment scripts.

When the skill finds something in git history, the fix is more involved than just removing the file. The credential needs to be rotated, because anyone who cloned the repository while the credential was visible could still extract it. Then the history needs to be cleaned up, which requires a force push and coordination with everyone who has the repository checked out.

The skill produces a report with the findings sorted by severity and a runbook for each finding that explains what to do. The runbook includes the rotation procedure, the history cleanup procedure, and a list of stakeholders to notify. This is the kind of detail that a generic secrets scanner does not include, and it is the part that turns a finding into a fix.

The Input Validation Skill

The fifth skill audits input validation across the application. The skill identifies every place where the application accepts external input and verifies that the input is being validated before it is used.

External input includes HTTP request parameters, file uploads, environment variables, configuration files loaded at runtime, message queue payloads, and data read from third-party APIs. Each of these is a place where untrusted data enters the system, and each needs to be validated before it is used in a sensitive operation.

The skill looks for input that flows into database queries, file system operations, command execution, deserialization, template rendering, and HTTP requests to other services. For each flow, the skill verifies that the input has been validated against an explicit schema and rejected if it does not match.

The most common issue the skill catches is input that is validated in one path and not in another. An engineer adds a new endpoint that calls an existing function. The existing function assumes its input has already been validated, because the original caller validated it. The new endpoint does not validate, because the engineer assumed the function would handle it. The result is an injection vulnerability that could not have been caught by reading either function in isolation.

The Configuration Audit Skill

The sixth skill audits configuration. Configuration is where security defaults turn into security disasters, because configuration changes do not go through the same review as code changes and the people who make them often do not understand the implications.

The configuration audit skill looks at infrastructure as code, deployment manifests, environment configuration, feature flag definitions, and any file that controls how the application behaves at runtime. It checks for common misconfigurations like overly permissive IAM policies, public S3 buckets that should be private, security groups that allow access from anywhere, debug mode enabled in production, default credentials that have not been changed, and encryption disabled where it should be enabled.

The skill is calibrated for the specific cloud provider and infrastructure stack I use, so it understands the difference between a configuration that is correct for development and one that would be a disaster in production. When it flags something, it tells me whether the issue is hypothetical or material, and what the fix looks like.

How the Skills Compose

The skills are designed to compose. I run the pre-commit audit on every commit. I run the dependency audit weekly. I run the authentication flow audit on every PR that touches auth-related code. I run the secrets scan monthly across the full history. I run the input validation audit on any PR that adds new endpoints. I run the configuration audit before any deployment to production.

This composition is the part that matters. A single audit run catches the issues that are present at one moment. A continuous audit pipeline catches issues as they are introduced, before they accumulate into a backlog that nobody has time to address.

The pipeline has a meta-rule attached. If any audit flags something at high severity, the relevant deployment is blocked until the issue is addressed or explicitly waived. The waiver requires a written explanation of why the issue is acceptable, which goes into a record that gets reviewed periodically. This means that when an issue is waived, it is waived deliberately, not by accident.

What the Skills Do Not Catch

I want to be honest about the limits. The skills catch the kind of issue that has a known pattern and shows up in a recognizable shape. They do not catch novel vulnerabilities, business logic flaws, or issues that require deep understanding of the application's threat model.

Examples of what the skills miss include race conditions in business logic that allow value extraction, authorization checks that are technically correct but enforce the wrong policy, side channels that leak information through timing or error messages, and chained vulnerabilities where each individual issue is low severity but the combination is high severity.

For these classes of issue, you still need human review. What the skills do is reduce the volume of low-hanging issues so that human review can focus on the hard problems. If a human reviewer spends 80% of their time catching missing semicolons in the security checklist, they have 20% left for the issues that actually require their judgment. Flip that ratio, and the audit becomes valuable.

Setting Up the Skills

If you want to build something similar, the structure is straightforward. Each skill is a markdown file that describes what to look for, what to flag, and how to format the report. The skill reads the relevant inputs, looks for the patterns, and produces a report.

The skills are stored alongside the codebase and version-controlled. When the codebase changes in a way that affects the security model, the skills change too. When a new attack surface is added, a new skill is added. When an existing skill produces too many false positives, it is tuned. The skills are living documents, not a one-time setup.

The most important thing is to run the skills consistently. A skill that runs every commit catches issues. A skill that runs once a quarter catches a backlog. The whole point of automation is to remove the human decision about whether to run the audit, and that only works if the audit runs every time.

What This Workflow Costs

The skills took about a day to write initially. Tuning them took another two days spread over the first month, as I saw which patterns produced false positives and which patterns missed real issues. Maintenance takes about an hour a month.

The time saved is harder to measure, because the value of catching a vulnerability is the cost of the breach that did not happen, and you cannot measure something that did not happen. What I can measure is that I no longer skip security audits, because the cost of running them is now measured in seconds rather than hours. The audits have caught real issues that would have shipped to production. The math is overwhelming, in the same way it always was, except now the math actually plays out in practice.

The Bigger Pattern

There is a bigger pattern here that goes beyond security audits. The pattern is that any kind of work that is high-stakes and tedious tends to get skipped, and the skipping accumulates costs that show up later. Code review skipped because it is tedious leads to bugs. Documentation skipped because it is tedious leads to onboarding pain. Security audits skipped because they are tedious lead to breaches.

The pattern for fixing this is the same in each case. Find the part of the work that is mechanical and automate it. Use the time saved to do the part that requires human judgment. Refuse to skip the work entirely, because the math is overwhelming if you account for the deferred costs.

Claude Code is a tool for executing this pattern. It is not a replacement for engineering judgment. It is a way to make sure the tedious 80% of the work gets done so that the engineering judgment can be applied to the 20% that needs it.

If you want to apply this pattern to your own codebase, the place to start is to pick one audit skill and run it. Pick the one that matches your biggest current risk. If you have ever committed a secret, start with the secrets scan. If your authentication is complex, start with the auth flow audit. If your dependency tree is deep, start with the dependency audit. Run it once. See what it finds. Fix what is real. Then schedule it to run continuously.

The first audit will probably find issues that have been sitting in your codebase for months. That is normal. The second audit will find fewer. By the third or fourth iteration, the audit becomes a regular checkpoint rather than a fire drill, and that is when the workflow starts paying back the time you put into it.

FAQ

How do I get started? Pick one audit skill that matches your biggest risk. Write a markdown file that describes what to look for and what to flag. Run it on your codebase. Tune it based on the results.

Do I still need professional security testing? Yes. The audit skills catch the patterns that are easy to encode. They do not catch novel vulnerabilities or business logic issues. Use them as the first line of defense, not the only line.

What about false positives? False positives are a cost. The way to reduce them is to tune the patterns, narrow the scope, and add suppression rules for known-safe cases. Aim for high precision over high recall on issues that block deployment.

How often should I run the audits? Pre-commit audits should run on every commit. Dependency audits weekly. Secrets scans monthly. Configuration audits before every production deployment.

Will this work for my language and framework? The pattern works for any language. The specific patterns depend on the language and framework. Customize the skills for your stack.

If you found this useful, follow for more posts about practical Claude Code workflows. I write about how I run a multi-product business with AI agents handling most of the operational work.

DEV Community