DEV Community

scarab systems
scarab systems

Posted on

AI Code Drift in the Wild: A Scarab Diagnostic Repair Pass

Scarab Field Test: Repairing an AI-Generated App Without Guessing Its Intended Baseline

I’ve been building Scarab Diagnostic Suite around a problem I keep seeing in AI-assisted development: the app may look close, the code may be mostly there, and some checks may even pass — but the repo still isn’t in a trustworthy state.

So I tested Scarab against a public GitHub repo that was explicitly asking for help with an AI-generated web app. The app had been created through a generated/vibe-coded workflow and the owner was looking for help cleaning it up, fixing broken behavior, and making it more stable.

The interesting part wasn’t just “can the code be fixed?”

The interesting part was: what does fixed mean for this repo?

Scarab’s repair pass surfaced that there were actually two valid repair postures:

  1. TypeScript intended — treat npm run typecheck as a real acceptance gate.
  2. Build/lint only — treat the app as a generated JavaScript React export, where build + lint are the intended acceptance boundary.

That distinction matters because a diagnostic suite should not blindly impose a standard the repo never chose. Sometimes the repair is not just technical. Sometimes the repair is clarifying the repo’s actual operating baseline.

Both repaired versions now:

  • build successfully
  • lint successfully
  • run locally in the browser
  • render the app correctly
  • include saved runtime evidence/screenshots
  • pass browser smoke checks across key routes

One of the more useful findings was that static checks were not enough. A governance/static pass could look clean while the browser runtime still revealed real problems: stray generated stub text, React not mounting meaningful app content, and missing local Base44 helper behavior outside the hosted runtime.

That is exactly the kind of failure I’m interested in.

Not just “does the code pass a command?”

But:

  • does the app actually render?
  • does the local runtime behave?
  • did the repair preserve the app’s intent?
  • did the repo become more coherent afterward?
  • is there evidence, not just confidence?

This is the distinction I keep coming back to with AI-generated code: completion is not the same as repo health.

A coding agent can produce code quickly. A repair pass can make code green. But a repo still needs a way to ask whether the result matches its actual baseline, whether the verification boundary is appropriate, and whether the system has returned to a quieter, more trustworthy state.

That is the space Scarab is being built for: repo-local diagnostics and repair for AI-assisted development.

Not another coding agent.

A way to help the repo prove what is true after the agent is done.

Top comments (0)