wael matoussi

Posted on May 29

The problem with security scanners isn't the scanning

#security #ai #devops #webdev

At a previous job I worked at as a Dev we had someone who ran Semgrep on our codebase for the first time. It came back with 180 findings. We had no security engineer. The developer who ran it looked at the output, closed the terminal, and we never ran it again.

That's not a story about a careless team. That's a story about a tool that produced output most teams in small companies with no expertise knew what to do with.

I've seen this exact moment happen more times than I can count working with small dev teams. And it's the reason I spent the last year building SecOpsium. But before I get to that let me explain the actual problem, because I think it's misunderstood.

The problem isn't the scanning

Semgrep and Gitleaks are excellent tools. They're free, actively maintained, and genuinely powerful. If you're not using them you should be.

The problem is what happens 10 minutes after you run them.

You get 200 findings. Some are critical. Some are test files. Some are commented-out code from 2021. Some are legitimate secrets. Some are variable names that pattern match against a rule but contain nothing sensitive. They all look the same in the output.

Now you're a developer who also does DevOps, also reviews PRs, also handles incidents, and you're staring at 200 items with no clear indication of which three actually matter this week.

So you close the terminal. Or you create a Jira ticket labeled "security findings" that lives in the backlog forever. Or you spend two days triaging manually and burn out before you fix anything.

This is the real problem. The scanning was never the hard part.

Why rule based scanners produce so much noise

It helps to understand technically why this happens.

Semgrep and Gitleaks are rule-based. They match patterns. A variable named api_key_example in a test file flags the same way as a live Stripe key in an active production config. Gitleaks scans for entropy and known credential patterns but can't distinguish between a key that was rotated and revoked six months ago and one that's live right now. Neither tool understands your codebase they treat a commented out credential the same as an active one sitting in a file that gets loaded on every request.

The result is a signal to noise ratio that makes the output nearly unusable for anyone who isn't already a security engineer. And the cruel irony is that developers learn to ignore it. Which is worse than not scanning at all, because now you have false confidence on top of real exposure.

Why this got significantly worse in the last 12 months

The noise problem existed before AI coding tools. What's changed is the volume and the nature of how secrets end up in code.

When a developer writes code manually and commits a secret, there's usually a story they were testing something, they forgot to move it to .env, they copied from a tutorial. The mistake has a human fingerprint and a human pace.

AI agents work differently.

A developer using Cursor, Copilot, or any agentic tool describes what they want and the agent writes it. Thousands of lines at a time. The agent's job is to complete the task "integrate Stripe payments," "add AWS S3 uploads," "connect to the database." It does that. And sometimes completing the task means pulling in a credential from wherever it can find one because that's what makes the code run.

Two patterns I've been seeing more frequently:

The unfilled placeholder that ships.
Agents generate credential placeholders YOUR_API_KEY_HERE, sk_test_REPLACE_ME, INSERT_DB_PASSWORD. These are meant to be replaced. Sometimes they don't get replaced. The developer sees everything working locally because their .env overrides it and doesn't notice the placeholder made it to the committed config file.

The forgotten context.
A developer pastes an error message into their AI tool to debug something. The error message contains a connection string with credentials. The agent uses that connection string in the fix it generates. The fix gets committed. Nobody noticed because the code worked and the PR was 47 files long.

The vibe coding reality and this isn't a criticism, it's just accurate is that when an AI is writing thousands of lines you stop reading every line. You read the diff at a high level, you check that it works, you ship it. The security review that might have caught a hardcoded key in 50 lines of handwritten code doesn't scale to 500 lines of generated code.

The scanners weren't designed for this. They were designed for humans making human speed mistakes. The false positive rate that was already a problem at human commit velocity becomes genuinely unmanageable when an agent is committing 10x the code per day.

What actually separates signal from noise

This is the part that rule based scanners don't answer and it's the part that actually matters.

Is it reachable?
A secret in a file that's never loaded in your production environment is a different risk from one in an active config that runs on every request.

Is it live?
A rotated, revoked key sitting in git history is noise. An active key with valid permissions is a fire. These look identical to a pattern based scanner.

Is it exposed?
A secret in server side code is a different exposure level from one baked into a frontend JavaScript bundle that gets served to every browser that visits your site. Most developers don't know that environment variables prefixed with REACT_APP_ or VITE_ get compiled directly into your frontend build output they're not server side secrets at that point, they're public.

What's the blast radius?
An exposed read only analytics API key is a different conversation from an AWS key with admin permissions or a Stripe live secret key. Treating them the same in a findings list is how developers learn to ignore findings lists.

None of the free scanning tools answer these questions. They find. They don't evaluate.

The SME reality that nobody in enterprise security talks about

I used to work at an SME. The security tools that existed were built for companies with dedicated security teams, six figure budgets, and months of onboarding time. They required professional services to implement. They assumed someone on your team already spoke the language.

We didn't have any of that. We had developers who cared about doing things right but had no practical path from "we should be more secure" to "here are three specific things to fix this week."

The gap between those two states is where most small teams live permanently. The tooling exists. The knowledge of what to do with the output doesn't.

What a small team actually needs isn't more findings. It's:

What is genuinely dangerous right now
What is probably noise
What to fix first and why
Explained in plain English without assuming security expertise

That's the gap I've been trying to close.

What I built and why it's still not perfect

SecOpsium uses Semgrep and Gitleaks under the hood I'm not reinventing scanning. What I built is the layer that sits between the raw scanner output and the developer who has to act on it.

The ML validation layer is trained and I'll keep improving it to reduce false positives based on context that rule based scanners ignore file type, variable naming patterns, whether the value has the entropy profile of a real credential, whether the file is in a test directory, and a dozen other signals. It doesn't eliminate false positives. It reduces them significantly enough that the output becomes actionable. The model is still being improved and I'm honest about that.

The priority queue ranks findings based on severity, exposure level, and fixability. Not everything flagged is equal. The output tells you what to look at first and gives you enough context to understand why without needing a security background to interpret it.

JS exposure scanning catches credentials that end up in frontend bundles a pattern that's genuinely underappreciated and one that's getting more common as AI agents write full stack code without distinguishing between what's server-side and what ends up in the browser.

Config auditing goes beyond secrets looking at configuration patterns that create risk even when no credential is directly exposed.

Scheduled and automated scans mean you don't have to remember to run it. For a small team that's not thinking about security every day, this matters more than any feature.

What it doesn't do yet: it won't catch everything. The ML layer has blind spots I'm still mapping. Some finding categories need more work. There are parts of the interface that aren't ready to be shown yet. This is an alpha and I'm treating it as one shipping it early, watching what real usage surfaces, and fixing things fast.

The commands worth running right now regardless

If you don't use any tooling and want to check a few things manually:

Check if a .env file was ever committed even if it's not there now:

git log --all --full-history -- .env

Find TODO comments referencing credentials that might have shipped:

grep -r "TODO.*key\|TODO.*token\|TODO.*secret" --include="*.js" --include="*.py" .

Check old branches for sensitive files:

git branch -r | xargs -I {} git log {} --all -- .env

Inspect Docker image layer history for secrets baked in during build:

docker history --no-trunc <image-name>

Check if secrets are in your frontend build output:

grep -r "sk_live\|AKIA\|ghp_" ./dist ./build 2>/dev/null

None of this requires a paid tool. It takes 20 minutes and you might find something that's been sitting there for months.

Where this is going

The pattern I keep seeing is that security gets treated as something to sort out later after the product works, after there are users, after there's funding. By then the habits are set and the exposure has been sitting there for a year.

I don't think small teams are careless. I think the tools made it too hard and too noisy to act on. That's what I'm trying to fix.

SecOpsium alpha is live at secopsium.com free pro access until July 31st, no credit card required. The scanning CLI is fully open source at github.com/secopsium/secopsium-cli if you want to look at how the scanning layer works.

If you run it and the prioritization feels wrong, or you're getting too many false positives in a specific category, I want to know. That feedback is more valuable to me right now than anything else.

Wael Matoussi — Founder, SecOpsium

DEV Community