Kenny Olawuwo.

Posted on Jun 6

AI Slop Is Becoming a Software Engineering Problem

#ai #aislop #scanaialop #claude

AI coding tools have changed how software gets written.

Developers are now using Cursor, Claude Code, Codex, Copilot, Windsurf, Cline, Lovable, Bolt and other agentic tools to move faster than ever. They can generate components, refactor services, write tests, scaffold APIs, migrate frameworks and explain unfamiliar codebases in minutes.

That speed is real.

But so is the mess it can leave behind.

The more I used AI coding agents, the more I started noticing the same patterns across different projects. Not always broken code. Not always obviously bad code. But code that felt slightly off.

Code that worked enough to pass a quick check, but carried strange decisions, unnecessary wrappers, swallowed errors, fake-looking abstractions, unused imports, hardcoded values, duplicated logic and comments that sounded confident but added no value.

That is what I call AI slop.

Not because AI-generated code is automatically bad. It is not. Some of it is genuinely useful.

AI slop is the residue left behind when code is generated quickly but not properly cleaned, validated or shaped into something maintainable.

And as more AI-written code reaches production, I think this is becoming one of the next important software engineering problems.

What AI slop looks like

AI slop is not one single thing.

It is a category of patterns that tend to appear when coding agents are trying to be helpful, defensive or overly complete.

For example:

try {
  await saveUser(user);
} catch (error) {
  // ignore
}

A swallowed error like this might look harmless in a generated flow, but in production it can hide the exact failure you need to debug.

Another common one:

const data = response as any;

This is the classic “make TypeScript stop complaining” move. It gets the code past the compiler, but removes the safety you were using TypeScript for in the first place.

Then you see things like this:

// This function processes the user data and returns the processed user data
function processUserData(userData) {
  return userData;
}

The comment adds nothing. The function adds nothing. But the codebase now has more noise.

Other examples include:

unused imports left behind after multiple agent iterations
hardcoded URLs, IDs or configuration values
TODO comments that become permanent
over-defensive validation around already-typed inputs
re-declared types that already exist elsewhere in the codebase
dead code from previous attempts
half-renamed variables
duplicate helper functions
console logs left in production paths
broad catch blocks that hide real failures
hallucinated imports or dependencies
oversized functions generated in one pass

Individually, each one can look small.

Together, they make a codebase harder to trust.

The problem is not that AI writes bad code

I do not think the right framing is “AI code is bad.”

That is too lazy.

The real issue is that AI coding agents optimise for completion. They are trying to satisfy the prompt, produce something plausible and keep the workflow moving.

That means they can often generate code that looks done before it has been properly shaped.

Human developers do this too, of course.

The difference is scale.

A developer might write messy code slowly. An AI agent can generate messy code across several files in seconds.

That changes the review problem.

Before, code review was mainly about checking what another human intentionally wrote. Now, review increasingly includes checking what an agent generated, why it generated it, whether it actually fits the surrounding codebase, and whether it introduced subtle maintainability debt.

That is a different kind of burden.

Existing tools were not really built for this

We already have linters, formatters, static analysis tools, type checkers, security scanners and AI code reviewers.

They all matter.

ESLint can catch syntax and style issues. TypeScript can catch type errors. Snyk can catch known vulnerabilities. Sonar can flag quality issues. AI review tools can comment on pull requests.

But AI slop sits in a slightly different space.

It is not always a compiler error.

It is not always a security vulnerability.

It is not always something a formatter can fix.

And by the time it reaches pull request review, the code has already been accepted into the developer’s workflow. Someone now has to spend time untangling it.

The earlier you catch it, the cheaper it is to fix.

That is why I think the next layer of developer tooling needs to move closer to the point of generation.

Not just after the pull request.

Not just after CI.

But while the agent is producing the code.

AI-generated code needs a quality gate

When agents become part of the development workflow, they also need guardrails.

A quality gate for AI-generated code should be fast, deterministic and easy to run locally. It should not need another large model to judge the output. It should not give a different answer every time. It should catch repeatable patterns that developers already know are risky or noisy.

That is the direction I have been exploring with aislop.

aislop is an open-source CLI that scans code for patterns commonly left behind by AI coding agents. It gives a score, highlights findings and focuses on things like swallowed errors, unsafe as any, dead code, hallucinated imports, hardcoded values, useless comments, oversized functions and other slop patterns.

The goal is not to replace human review.

The goal is to stop obvious AI-generated mess from reaching human review in the first place.

A simple workflow could look like this:

npx aislop scan

Or inside an agentic workflow:

npx aislop scan --json

The agent writes code, the scanner runs, and the feedback goes back into the loop before the code reaches the pull request.

That is the part I find most interesting.

Not just:

Scan my repo.

But:

Before you tell me the task is complete, prove that the code is clean enough to continue.

That is where this category becomes useful.

Deterministic checks still matter

There is a lot of excitement around LLM-based code review, and I understand why. LLMs can explain, reason and catch things that simple rules may miss.

But not everything needs another model.

Some patterns are deterministic by nature.

An empty catch block is either there or it is not.

An unsafe cast is either there or it is not.

An unused import is either there or it is not.

A hardcoded secret-like value is either there or it is not.

A file with large generated functions, repeated helpers and useless comments can be scored and flagged without sending it to an LLM.

That matters because deterministic tools are fast, cheap, repeatable and easier to trust in CI.

When code is being generated more frequently, we need tools that can keep up with that speed.

False positives are the hard part

The hardest part of building this kind of tool is not finding bad patterns.

It is avoiding lazy judgement.

There is a big difference between code that looks like slop and code that is intentionally written a certain way for a good reason.

That is why false positives matter a lot.

I learned this the hard way when someone ran aislop on a mature open-source Python project and the score came out badly. At first glance, it looked like the tool had found a lot. But after going through the findings one by one, many of them were not genuine issues. They were bugs in my own detection logic.

So I fixed them.

That experience made the tool better.

It also made the philosophy clearer: the goal is not to shame codebases or produce dramatic scores. The goal is to give developers useful, fair feedback that helps them clean up the residue from AI-assisted development.

A tool like this only earns trust when developers can look at the findings and say:

Yeah, that is fair.

This category will become more important

The more code agents write, the more teams will need standards around agent output.

I think we will start seeing questions like:

Did the agent introduce unnecessary abstractions?
Did it leave behind dead code?
Did it swallow errors?
Did it use unsafe casts to force the code through?
Did it duplicate existing logic instead of reusing what was already there?
Did it generate comments that explain nothing?
Did it hardcode things that should live in config?
Did it produce code that passes locally but creates long-term maintenance debt?

These are not theoretical questions.

They are already showing up in real workflows.

The companies and teams that benefit most from AI coding tools will not be the ones that blindly accept every generated diff. They will be the ones that build strong feedback loops around generation, validation, testing, review and cleanup.

AI can help us write code faster.

But speed without quality control just moves the bottleneck somewhere else.

The tool: aislop

This is why I started building aislop.

aislop is an open-source CLI for detecting the kind of code quality issues AI coding agents often leave behind. It scans your codebase locally, gives a score from 0 to 100, and highlights patterns like swallowed errors, unsafe as any, dead code, hallucinated imports, hardcoded values, useless comments, oversized functions, duplicated logic and other signs of AI-generated mess.

You can try it with:

npx aislop scan

For CI or agentic workflows, you can also use JSON output:

npx aislop scan --json

Links:

The idea is simple:

Before AI-generated code reaches pull request review, run a fast deterministic quality gate over it.

Not to replace human review.

But to catch the obvious slop early, while the agent or developer can still fix it quickly.

Top comments (2)

Andreas Müller • Jun 6

Thanks for the effort. For certain types of errors this will definitely be useful.

But I still think the best quality gate right after generation is your own pair of eyes, or that of another experienced developer. This kind of tool seems very helpful when you let the AI generate a lot of code at once.

But this is not how you need to work with AI (avoid peer pressure / media pressure at all costs, in programming as in life). I get around the AI slop problem mostly by using the AI only in very small, well-defined contexts. Think single method, single class or at most 3 or 4 class changes. Some may say I'm not getting nearly as much out of the AI that way as is possible.

I humbly disagree. Yes, in terms of raw speed I could get much more out of AI. However, my priority is not raw speed. My priority is producing high quality software solutions. Which means I use the AI where it makes sense for my workflow, and my workflow has always been: Small change, review, verify. Another small change, review, verify. Repeat until the problem is solved.

In that workflow, AI still makes me a lot faster, especially by writing most of my single class unit tests for me, or extending / modifying single class unit tests. But the key is keeping the AI workload small and focused. Just as you would do with your own workloads. I rarely ever experience AI slop that way, and in that kind of workflow I have no real need for a tool like the one you're implementing, because I can easily catch those things by reading the small changes the AI makes (which I do of course).

However, I realize not all developers have that luxury. Sadly many seem to be pressured into doing the opposite: Let AI take on larger and larger workloads at once. Or they are falling for the siren song of just letting AI do all their work for them. Whatever it is, I think many will need a tool like yours in their workflows. Which is sad, because it shouldn't be necessary ideally. Ideally we would limit the AI daily output to a quantity we can still carefully review. But alas, I think it is not to be, because reason is powerless in the face of the almighty productivity gain.

xulingfeng • Jun 7

'Code that worked enough to pass a quick check, but carried strange decisions, unnecessary wrappers, swallowed errors' — you nailed the distinction. It's not about AI writing bad code. It's about code that passes the first pass but contains assumptions nobody validated. The 'not broken ≠ not right' gap is where the expensive incidents come from.