I have started running Scarab Diagnostic Suite against real open-source issues as field tests.
Not synthetic examples.
Not toy repos.
Not controlled demo fixtures.
Real repos.
Real failures.
Real diagnostic pressure.
The first field test was against Open WebUI issue #25078, where Azure OpenAI connections broke after connection settings were edited.
The reported failure had a clear shape:
A working Azure config originally included:
text azure: true
After editing the connection, the saved config could become:
text provider: "azure"
without preserving the older azure flag.
The backend runtime path then failed to recognize the connection as Azure-shaped and fell through to a generic OpenAI /models path.
The result was an Azure request going to the wrong endpoint shape and failing.
That is the kind of bug that looks obvious once named, but can become noisy during repair because it touches multiple surfaces:
configuration persistence,
provider metadata shape,
runtime provider branching,
Azure endpoint behavior,
model listing behavior.
The purpose of this first field test was not to claim blind discovery.
This was a guided diagnostic repair pass.
The issue was known. The failure was described. The test was whether Scarab Diagnostic Suite could take a known live issue and convert it into a bounded diagnostic repair lane.
What SDS found
SDS identified the issue as a provider/config contract failure:
text provider_config_contract.runtime_branch_missing_provider_shape
Plain English:
Open WebUI had more than one persisted way to describe an Azure connection, but the runtime branch only recognized one of them.
The broken contract was:
text provider: "azure"
could exist in persisted config, while the runtime path only checked:
text azure: true
That meant Azure-shaped config could be treated as generic OpenAI config.
This is the kind of thing Scarab Diagnostic Suite is designed to name.
Not:
“Something is wrong in this file.”
But:
“A provider identity contract exists in one layer and is not being honored by another layer.”
That distinction matters.
A bug report gives a symptom.
A diagnostic suite should identify the contract that broke.
The repair lane
The local repair was intentionally small.
The runtime path needed to treat both of these as Azure-shaped config:
text azure: true provider: "azure"
The repair did not need to redesign provider configuration.
It did not need to rewrite connection persistence.
It did not need to touch unrelated OpenAI provider behavior.
The repair lane was:
text Persisted provider shape → runtime provider branch
That was the contract boundary.
The regression test proved the issue-shaped failure before repair and passed after repair.
Before repair, the failing path attempted a generic /models request against an Azure base URL.
After repair, the Azure provider-shaped config was routed through the Azure-aware branch.
SDS after-check
After the repair, SDS reran the same focused diagnostic lane.
The original provider/config finding cleared.
That is the loop I wanted to test:
diagnose the contract boundary,
repair only that boundary,
prove the behavior with a focused regression,
rerun diagnostics,
confirm the finding cleared.
This is the first important Scarab pattern:
The point is not just that the code changed. The point is that the diagnostic finding changed in response to a bounded repair.
That gives the repair a shape.
It is not just “the agent patched something.”
It is:
SDS identified a contract failure.
Codex repaired that contract failure.
A regression proved the behavior.
SDS confirmed the targeted finding cleared.
What this field test proved
Field Test #001 proved a specific thing:
Scarab Diagnostic Suite can take a known live bug report, identify the underlying contract boundary, constrain the AI repair lane, and verify that the targeted finding clears after repair.
It did not prove blind bug discovery.
It did not prove full repo repair.
It did not prove that SDS should own the fix.
It proved something more practical:
A diagnostic suite can keep an AI coding agent from turning a known bug into an overbroad patch.
That is the product thesis beginning to show.
AI coding agents are fast, but speed without boundaries creates drift.
Scarab’s role is not to be the coder.
Scarab’s role is to maintain repo truth while the coding agent operates.
In this field test, that meant turning a provider/config bug into a narrow, testable repair lane.
Internal classification
Field Test #001
Project: Open WebUI
Issue: #25078
Surface: provider/config contract
Mode: guided diagnostic repair pass
Primary finding: provider_config_contract.runtime_branch_missing_provider_shape
Repair lane: persisted provider metadata shape → backend runtime branch
Result: local repair successful; focused regression passed; SDS finding cleared
Primary lesson: SDS can translate a known issue into a bounded contract repair lane
Why this matters
A lot of AI-assisted coding fails because the agent starts with a symptom and then wanders.
It fixes nearby code.
It edits unrelated files.
It changes tests to match the patch.
It follows the noise instead of the boundary.
It makes the repo harder to reason about.
This field test showed the opposite pattern:
one issue,
one contract boundary,
one focused regression,
one bounded repair,
one diagnostic after-check.
That is the repair style I want Scarab Diagnostic Suite to enforce.
Not louder AI.
Better boundaries.
Top comments (0)