J.S_Falcon

Posted on Apr 29

What Operations Discipline Brings to AI-Assisted Coding: A Cross-Domain Field Guide

#ai #productivity #discuss #watercooler

TL;DR

I moved from operations / systems engineering into the software side via AI collaboration. Part 1 of this series (the entity resolution case study) is the build; this is the methodology.
Five practices and five anti-patterns, filtered through an ops lens — but the lessons generalize.
Not "AI tips you've heard." Patterns that fall out naturally if you treat AI sessions like config reviews, runbooks, and validation procedures.
Each piece is paired with a real misstep I made building Part 1.
Each part of this series stands alone. Read in any order.

Why an "Operations Discipline" Lens

Operations engineers spend their careers internalizing four habits:

Plan before you build — designs, runbooks, change requests.
Verify before you declare done — validation procedures, post-change checks.
Document state — configs, design docs, postmortems.
Suspect numbers — every monitoring datapoint hides an artifact.

These habits transfer directly to working with AI coding assistants. The disciplines you learned debugging routers, filing change requests, and reviewing configs are the same ones that prevent AI sessions from sliding off the rails.

I'm framing this through ops because that's the lens I learned from. Most of these patterns generalize beyond ops — software engineers, data engineers, and SREs will recognize them. The ops version just happens to package them tightly.

Part 1: Five Practices

Practice 1 — Treat your CLAUDE.md (or system prompt) as a design-spec preamble

In ops, every change procedure has a preamble: prerequisites, scope, rollback steps, validation checks. Same energy in AI work.

CLAUDE.md is Claude Code's persistent instruction file. (Other assistants have equivalents — system prompts, custom instructions, etc.) Use it the way you'd use a runbook preamble:

## Operating principles
- Always plan before implementing.
- Confirm ambiguous instructions before coding.
- Always provide a counter-argument when proposing a design.
- Never report a metric without showing how it was measured.
- Distinguish "should work" from "actually verified to work."

Once written, every future session inherits these rules. You stop re-explaining yourself. This is the same template-then-reuse pattern that saves you from rewriting a runbook for every change window.

Practice 2 — Demand a Devil's Advocate, every time

Design reviews exist because group-think kills production systems. Force the AI to argue against itself in every proposal.

Three asks I bake into every meaningful design conversation:

What's the worst-case failure mode of this design?
What use case did you not consider?
Give me three reasons to reject this design.

Bake this requirement into your CLAUDE.md and you stop seeing pure agreement. An AI that only agrees with you is a single point of failure.

Practice 3 — Force ambiguous instructions to be confirmed before implementation

In ops requirements gathering, "implement loose specs" is a known disaster pattern. The same is true for AI sessions, where ambiguity gets resolved silently — and usually wrong.

Real example from Part 1: I said "treat the ID and the display name as a pair, match if either is present." The AI interpreted that as two independent search keys. Half the matcher had to be rebuilt.

Lesson, written into CLAUDE.md: if an instruction has two valid readings, ask which one I mean before writing code.

This is the same habit as a senior network engineer asking "do you mean inbound or outbound?" before touching the firewall.

Practice 4 — Separate "theoretical evaluation" from "real-world evaluation"

Ops engineers know the gap between "the spec says it works" and "I've watched the LED light up." The same gap exists in AI work, and it's wider than you'd think.

Real example from Part 1: the AI claimed about 99.2% recall based on past-data pattern analysis. I asked for an actual run on the real dataset. The actual recall came back at 55%.

The lesson is not "the AI lied." The lesson is that pattern-analysis predictions are not the same as a real execution result. Every claim that sounds like a measurement deserves the question: was this measured, or estimated? If estimated, label it that way and move on; if measured, show the run.

Practice 5 — Have the AI write its own verification scripts

If the AI says "this code achieves 99% recall," ask it to write the script that measures that recall. Then run it.

This converts:

A claim → a script.
A script → an audit trail.
An audit trail → reproducibility.

It is the same pattern as runbooks: a change procedure and a validation procedure, always paired. The validation script becomes a permanent artifact you can hand to the next person — or to your future self when something regresses.

Part 2: Five Anti-Patterns

Anti-Pattern 1 — "Just build me a tool"

The AI equivalent of "fix the network." Without scope, the AI invents one. Worse, it pursues the invented scope confidently, so the wrong direction is pursued aggressively.

Treat session start like requirements gathering: rough goal, key constraints, what's explicitly out of scope. Five minutes of scoping saves five hours of rework.

Anti-Pattern 2 — Trusting headline numbers without verifying composition

"99% recall" sounds great until you discover it was measured on cherry-picked rows, with the test set leaking into training data, on a metric that doesn't reflect the actual user experience.

Before reporting any number, ask:

How was this measured?
On what data?
Under what conditions?
With what biases?

This is the same suspicion you apply to a monitoring dashboard reporting zero alerts: is the agent actually reporting, or is it dead?

Anti-Pattern 3 — Throwing raw error text at the AI without context

"It doesn't work" → "Why?"

In ops you'd never debug a router by saying "it's down." You'd attach: configuration, status output, syslog excerpts, behavior of connected devices.

Same here. The AI cannot infer your environment. Show the command, the actual output, the expected behavior, and the deviation. Treat each interaction like a bug report you'd file with a vendor.

Anti-Pattern 4 — Sending business data to an AI without compliance review

Default assumption: any data you put into a prompt may be retained, indexed, or used in training, regardless of what the vendor's marketing copy says.

The operational habit is straightforward — redact, mask, or synthesize. The same instinct that keeps you from posting customer IPs to Stack Overflow should stop you from pasting customer rows into a prompt.

(Part 1 covers this pattern in depth as it applied to the entity resolution build. The short version: deterministic logic touches the data; the AI touches only code, design notes, and synthetic samples.)

Anti-Pattern 5 — Stopping at "it works"

"The code runs" is not the same as "I understand why it runs."

The ops version of this is: a configuration that worked once but I can't explain is a future incident.

Make the AI explain why the working solution actually works. If neither of you can defend the design after one cycle of follow-up questions, treat it as a yellow flag — not a green light. Ship explainable code; the unexplained kind owns you on the day it breaks.

Wrap-Up

The pattern across all ten:

Apply ops discipline to AI sessions.
Treat AI claims like vendor claims — verify them in your environment.
Treat AI conversations like change windows — preamble, scope, verification, postmortem.
Treat AI outputs like config diffs — explain them or reject them.

What I'm explicitly not claiming:

These are not unique to operations engineers. They generalize. They just happen to package tightly through the ops lens because the discipline is already there.
These are not the only practices. Five is a lossy compression. The ten you'd build for your environment may differ in detail.
These cover the build phase of AI-assisted work — the session-time discipline. Day 2 operations (monitoring AI-generated code in production, detecting silent drift, incident response when AI-assisted changes break) is its own discipline and deserves its own article. The patterns here are necessary but not sufficient for production AI usage.

The big idea: AI doesn't replace engineering judgment — it amplifies it. Amplifying lazy judgment produces more bad code, faster. Amplifying disciplined judgment produces clear, audited, defensible work.

What's Next

A future part of this series will cover how the design review for these articles actually happened — a Multi-AI Adversarial Review (MAAR) loop where Claude and a second AI argued against each other under human routing. That's the meta-process behind both Part 1 and this one.

If you came in via this article, Part 1 is the concrete build that produced these lessons.

Comments welcome — particularly:

The five practices or anti-patterns you'd add.
Cross-domain engineering experiences (any technical background → another).
Cases where ops discipline did not transfer cleanly to AI work.
Rollback strategies when an AI-assisted change corrupts your codebase or repo state.
Day 2 operations practices for AI-generated code in production (monitoring, drift detection, incident response).

DEV Community