I run two AI assistants from different vendors against each other on every non-trivial decision, with a human (me) sitting in the middle as the routing authority. What's surprised me most is not that they disagree — it's how they disagree. They drift toward agreeing with me too quickly, then with each other too quickly, and I've had to design specific friction into the workflow to keep their disagreement productive.
Writing this down because I think the friction patterns might be useful to other people running similar setups.
What this is
I've been working as an operations / systems engineer for about twenty years. Networks first, then servers, then config management, then incident response. The kind of role where the job is mostly making sure nothing breaks, and where every change request comes with a runbook, a verification step, and a rollback plan.
In the last year and a half, AI coding assistants pulled me into the software side of the house. Not because I wanted a career change — because the boundary between "writes code" and "runs systems" got thin enough that ignoring it stopped being an option.
This series is a running log of what I've been finding, written as I go. The disciplines I pulled in from operations turned out to map onto AI collaboration unexpectedly well. The places where they didn't, I've had to invent something for myself. Observation-heavy and prescription-light. I don't think I've discovered anything new; I've stumbled into the same territory a lot of practitioners and researchers are mapping right now, and I'm writing down my coordinates while they're fresh.
What this is not
A few things I want to flag up front.
This is not a survey. I haven't done an exhaustive review of the literature, the toolchain ecosystem, or the practitioner community. I'm certain there are people who've been running similar disciplines longer than me and writing them up better. If you find one, please send me the link — I'll learn from it, and I'd rather correct the record than defend a stale draft.
This is dated material. Anything you read here is a snapshot from early 2026, written from one specific vantage point: a Japanese operations engineer transitioning into AI-assisted software work, with paid Claude / ChatGPT / Gemini subscriptions and no team-scale deployment. If your context differs significantly, the disciplines may transfer poorly or not at all.
The five disciplines this series circles around
A set of operating principles I keep coming back to in my own daily work. Each earns its own article (or several) in the series. The short version, in a "simple to compose" → "complex to compose" order:
1. TTL Satisficing
I cap the number of back-and-forth rounds in any AI-assisted decision. Three is a good default. The point is not to find the perfect answer — it's to converge on a good-enough answer before the next round costs more than the marginal improvement is worth. If three rounds don't converge, I treat that as a signal that the problem is the problem, not the round count.
This is essentially timeout / retry-budget design from network operations, applied to a conversation with an AI.
2. The Karpathy Rule (Simplicity First, Surgical Changes)
Andrej Karpathy's framing, adapted to my workflow: I don't let the AI add what wasn't asked for, and I don't let it edit what wasn't in scope. Speculative additions are how technical debt grows at AI speed. Keeping the diff small and the proposal narrow is a discipline I have to actively enforce.
This maps cleanly onto YAGNI ("You Aren't Gonna Need It") and minimal-diff code review culture — pre-AI principles, applied to a faster-moving collaborator.
3. Fail-Fast (Tolerance for Contradiction)
When a problem doesn't converge across multiple attempts and multiple AIs, I want the AI to declare it unfeasible and hand me a graceful-degradation path. The AI's job is not to grind itself into the ground trying to solve everything. Its job is to return judgment to the human when judgment is what's actually needed. "I can't solve this, here's how to amputate it cleanly" is a better answer than "I'm trying again, please wait."
This is circuit-breaker behavior from distributed systems, applied to AI reasoning loops.
4. Adversarial Review (the Devil's Advocate, made standing)
For any non-trivial proposal, I force the AI to argue against itself. Standing instructions — in CLAUDE.md, in custom instructions, in the system prompt — so I don't have to remember to ask. An assistant that only agrees with me is a single point of failure, not a colleague.
This is red-team review and RFC-style structured criticism, made automatic instead of episodic.
5. The Human Router (Multi-AI Adversarial Review)
For decisions with real trade-offs, I run two AIs from different vendors against each other and stay in the loop as the routing authority. Not as supervisor, not as referee — as the operator who decides which output goes where, and which conflicts get escalated. The cross-vendor part matters in my experience: same-vendor "multi-agent" setups still share a worldview. Cross-vendor (e.g., Claude vs. Gemini) actually surfaces disagreement.
This is human-in-the-loop control plane thinking, where the data plane is "AIs argue" and the control plane is "human decides which arguments matter."
I want to be clear: none of these are inventions of mine. The literature has Multi-Agent Debate, Self-Refine, Reflexion, Constitutional AI, and a growing body of human-in-the-loop research. I'm describing how I personally compose these ideas into a daily workflow, and writing down the specifics in case the composition is useful to someone.
Why "Cross-Domain"
I keep using this word and I should explain it.
The AI engineering conversation I see is heavily centered on the programmer's lens — code generation, dev tools, IDE integrations, prompt-to-PR workflows. That lens is real and valuable, and I'm not arguing against it.
But I came in from operations, and operations has its own logic. Change management. Verification gates. Runbooks. Rollback procedures. Skepticism toward dashboards that are too green. The instinct to ask "how would I know if this is silently broken?" before celebrating a green build.
When you bring those instincts to AI collaboration, you get a different shape of practice than you'd get if you came in from the dev side. Not better. Different. And I think the differences are worth writing down, because operations engineers, SREs, data engineers, business systems people, and infrastructure folks are all about to find themselves doing AI-assisted work — and the dev-centric playbook isn't going to fit them perfectly.
That's what "cross-domain" is pointing at. Not crossing one specific bridge, but recognizing that there are many bridges, and that each engineering discipline brings its own toolkit to the AI collaboration table.
What's coming in this series
(Links will appear as articles are published. Each piece is designed to stand alone — you don't need to have read this anchor page to follow any of them. This index is just here for the curious.)
- Entity Resolution case study — A practical build: deterministic two-system reconciliation with AI assistance, no business data leaving the local environment, ~99.2% recall on historical replay. Where I learned that operations discipline transfers cleanly to software work in unfamiliar domains.
- AI Collaboration Patterns — Five practices and five anti-patterns I picked up running AI-assisted operations work day to day. The kind of thing you only learn by getting burned.
- Alchemy Essay — A late-night thought experiment mapping AI coding onto the principles of Fullmetal Alchemist. The most polemical piece in the series; also the most personal.
- Domain-Native Governance — The architectural argument: why dev-centric AI governance feels insufficient from where I sit, and what a domain-native alternative might look like. The most theoretical piece in the series.
- Domain Logic First — A position piece: each engineering discipline has its own logic, and there might be value in adapting AI to it rather than the other way around.
- Field-Note Tips (ongoing) — Short pieces on individual operating habits: the CLAUDE.md preamble template, three-tier tool routing, telling fast evaluation from real evaluation, etc.
A note on tone
The Japanese versions of these articles tend to land in a softer register than the English ones. That's a translation choice; the underlying observations are the same. If you read both, the English will feel more direct and the Japanese more reflective. Neither is the "real" version — both are. I'm a different writer in each language, and I've stopped trying to flatten that.
A note on the time-shift
If you're reading this in 2027 or 2028 and any of these disciplines feel obvious by now, good. That means the field moved. The reason these notes exist is that I was figuring this out in early 2026 and wanted a record. I'd rather have written them down too early and looked dated than written them down too late and lost the messy details.
If you're reading this in 2026 and any of these disciplines feel new: I'm right there with you. We're figuring this out together.
Comments welcome. Particularly from:
- Engineers from non-dev domains who've started AI-assisted work
- Researchers working on multi-agent debate / adversarial review / human-in-the-loop systems
- Anyone running multi-AI cross-vendor protocols and willing to share what's worked or broken
- Anyone who thinks one of these five disciplines is wrong and wants to argue
- Anyone who has prior art (papers, blog posts, internal write-ups) for any of this — I'd genuinely like to read them
Top comments (0)