I built the quality gate that IBM, Google, and Cursor all skipped

#ai #productivity #opensource #architecture

April 28, 2026 was a weird day for me

IBM shipped Bob. Thoughtworks published SPDD. Researchers at Fudan, Peking, and Shanghai AI Lab published Agentic Harness Engineering on arxiv. Microsoft shipped A2A v1 backed by AWS, Cisco, Google, IBM, Salesforce, SAP, and ServiceNow.
Four independent teams. Same day. Same problem: orchestrate AI across a software development workflow.
Every single one of them stopped at generation.

The question nobody answered

How do you know the output is actually good?
They all stop at generation. A human checks the checkpoint. A reviewer approves the step. The system moves on. That's supervision by convention, not by architecture.
I've been working on the answer for nine months.

Meet Pappy

Pappy is a QC role inside Orca that scores every pipeline output before it reaches the user. PASS, WARN, or FAIL with a confidence score. Failed runs trigger an automatic repair loop. Verified runs feed Moonshiner, a distillation pipeline that trains small specialist models from quality-gated data only.
IBM documents what happened. Pappy decides whether it was good enough.
The trace becomes the curriculum.

How the rest maps

Every major architectural decision in Orca has a direct parallel in what shipped on April 28:
Brain handles task decomposition and model routing. That's Bob's multi-model orchestration.
Miranda enforces compliance and human approval gates per task type. That's Bob's configurable checkpoints, except enforcement is in the protocol, not manual configuration.
Benson is the only user-facing voice. One consistent output layer regardless of what ran underneath.
Orca's agent handoff layer is architecturally aligned with the A2A v1 standard the industry ratified this week. AHP is Orca's internal trust layer. A2A is Orca's external compatibility layer. No other system in that April 28 pile has both.
Moonshiner distills verified runs into training data. That's AHE's experience observability pillar.
ARCHITECTURE.md and CLAUDE.md enforce explicit revertible component scope across agent handoffs. That's AHE's component observability pillar.

Who built this

I'm a self-taught solo developer in Eastern Kentucky. No CS degree, no co-founder, no local technical peers. I built this over nine months in focused sessions using AI coding agents because I don't write code directly. Every major architectural decision I made, four independent teams published on the same day two days ago.
That's either validating or humbling depending on how you look at it. I choose validating.

Try it
v1.2.16 is live. 620 tests passing across 12 packages. Windows installer and portable .exe both available. Apache 2.0. Free. Runs on your machine. You own your data.
Pipeline tracer demo: https://www.loom.com/share/01765a415d0e4027b115427693a8734a
Desktop demo: https://www.loom.com/share/1e94a7c0fb7c476d89d6d1230fb541db
GitHub: https://github.com/junkyard22/Orca
Releases: https://github.com/junkyard22/Orca/releases
The mission is making high-quality AI accessible to everyone at low cost. Orca is the foundation.

DEV Community

I built the quality gate that IBM, Google, and Cursor all skipped

Top comments (0)