Scarab Systems

Posted on Jun 11

Scarab Field Lab Is Public: A Case File Repo for Diagnostic Field Tests

#ai #devops #discuss #testing

I opened the public Scarab Field Lab this week:

https://github.com/scarab-systems/scarab-field-lab

This repo is not the Scarab Diagnostic Suite source code. It is not a patch farm. It is not a feed of AI-generated fixes.

It is a public case-file repo for Scarab Diagnostic Suite field tests.

The goal is simple: when Scarab/SDS is used to investigate a real public software issue, the field lab records the public-safe diagnostic trail. That includes the target repo, issue or PR links, the diagnostic finding, the mode of the case, the public status, and any validation summary that can be shared safely.

In other words: if I say Scarab helped narrow a boundary failure, the field lab is where the public record lives.

Why make this a GitHub repo?

Field-test work needs structure.

A DEV.to article is useful for explaining a case. A GitHub issue or PR is useful for upstream review. But neither of those is the right place to preserve the whole diagnostic record across many projects.

The field lab gives the work a stable public home. It lets people see which public issues were examined, which cases stayed diagnostic-only, which cases produced local repair candidates, which cases became upstream PRs, which cases were accepted upstream, which claims are supported by public evidence, and which claims are deliberately limited.

That last part matters.

A case being listed in the field lab does not mean the upstream project endorsed Scarab. It does not mean a patch was accepted. It does not mean Scarab “fixed” the project. The status field says what actually happened.

That keeps the record honest.

What a case file is

A case file is a small public record of a diagnostic field test.

It usually answers a few basic questions:

What project was examined?
What public issue or PR is connected to the case?
What mode is the case in?
What specific boundary or failure shape was identified?
Was a patch prepared?
Was an upstream PR opened?
Was validation run?
What is safe to publicly claim?

That is it.

No giant transcript. No target repo clone. No private local paths. No unpublished maintainer correspondence. No “trust me, the AI said so.”

The field lab is designed to publish enough information to make the work inspectable without turning the repo into a junk drawer of private run artifacts.

Modes and statuses

The repo separates diagnostic modes from public outcomes.

A case can be diagnostic-proof. That means SDS recorded a finding, but no public patch is being claimed.

A case can be repair. That means a local or prepared repair exists, but that does not automatically mean upstream accepted it.

A case can be diagnostic-proof-and-repair. That means the diagnostic finding and a repair candidate are both recorded.

A case can be upstream-pr-recorded. That means a human-reviewed PR or draft PR is publicly linked.

A case can be upstream-accepted. That means an upstream maintainer accepted or merged the public PR.

Those distinctions are boring on purpose. They stop everything from collapsing into a vague “we fixed it” claim.

For developers, that matters because the difference between “I found a boundary,” “I prepared a patch,” “I opened a PR,” and “the project merged it” is not cosmetic. Those are different facts.

The field lab keeps them separate.

The mechanical diagnostics boundary

One of the main reasons I wanted this repo public is to make the Scarab boundary explicit.

Scarab Diagnostic Suite is not an AI coding agent.

The diagnostic suite is mechanical. It inspects repository evidence, compares expected and observed behavior, and records specific findings.

It does not use generative model reasoning to decide what is true. It does not submit unattended patches. It does not treat an AI response as validation.

AI assistance may enter later. For example, AI-assisted tooling may help draft a narrow patch, summarize a diagnostic record, organize validation notes, or prepare a maintainer-facing explanation.

But that happens after the diagnostic evidence exists.

The separation looks like this:

text SDS finds evidence. A human reviews and owns the claim. AI may assist with implementation or writing. Maintainers decide what belongs in their project.

That is the operating model.

Why this matters in practice

A lot of debugging starts from a symptom.

A command fails outside a project directory. A compiler path accepts a type it should reject. A response API hangs instead of settling. A test passes but the behavior is still wrong.

The field-test approach tries not to stop at the symptom.

The useful question is usually:

text Which boundary stopped preserving the behavior another part of the system depended on?

That question is where the case file starts. The repair, if there is one, should come after the boundary is understood.

That is why I care about the diagnostic record.

Without the record, a patch can look like a fix while still being hard to review. With the record, a reviewer can see the intended repair lane:

This is the failure shape.
This is the boundary.
This is what should stay unchanged.
This is the narrow behavior being restored.
This is what the test proves.

That is the kind of contribution I want Scarab field tests to produce.

What the repo does not contain

The field lab deliberately does not contain everything.

It does not contain SDS product internals, cloned upstream repos, target worktrees, secrets, local paths, private prompts, private maintainer correspondence, or raw AI transcripts.

That is not because those things are unimportant. It is because a public evidence repo should stay public-safe.

The field lab is for case records and public links. The target project remains the authority over its own source code. The upstream issue or PR remains the authority over upstream review.

The field lab records what Scarab investigated and what claim is safe to make.

Why not just publish field reports?

The field reports are useful, but they are narrative. They explain the bug, the boundary, and the repair in plain English.

The field lab is more structured. It is closer to an evidence index.

A DEV.to field report can say:

text Here is what happened.

The field lab can say:

text Here is the case record. Here is the public issue. Here is the public PR if one exists. Here is the status. Here is the validation summary. Here is the claim boundary.

Those two things work together.

The article is the story. The repo is the record.

What I want developers to see

I do not expect every developer to care about Scarab as a product yet. That is fine.

What I want developers to be able to inspect is the method.

Does the case distinguish symptom from boundary? Does the repair claim stay narrow? Does the status match the public upstream reality? Does the case avoid implying maintainer endorsement where none exists? Does the diagnostic record make the patch easier to reason about? Does the process reduce noise instead of adding more?

That is the bar I am trying to hold.

Open-source maintainers do not need more confident noise. They need clear, reviewable, bounded contributions.

The public promise

The shortest version of the field lab is in the Scarab Boundary Contract:

text SDS finds evidence. People make claims. Maintainers decide.

That is the whole posture.

Scarab does not replace maintainers. It does not override upstream ownership. It does not claim that every local repair belongs upstream. It does not pretend AI confidence is proof.

It records diagnostic evidence so human-owned repair work can start from a clearer boundary.

That is why the field lab exists.

It is a public diagnostic record.

And now it is open.

Top comments (2)

xulingfeng • Jun 11

The way the Field Lab separates modes — "diagnostic-proof" from "repair" from "upstream-accepted" — is honestly my favorite part. Most tools only tell you "we fixed it." Yours shows the chain of evidence, and where each link stops.
That "which boundary stopped preserving the behavior another part depended on" question is going to sit in my head for a while.
Starred and forked. Looking forward to seeing the first cases land.
The day you publish a Field Lab case titled "the diagnostic system got diagnosed" — that's when I know you're truly drinking your own champagne 😏

Scarab Systems • Jun 11 • Edited

haha... that's coming... I'm implementing a pre-flight pass for Scarab for now...