The AI Perimeter: Where Automation Should End and Judgment Should Begin

#ai #leadership #architecture

Everyone posting about AI is selling it. The frameworks, the workflows, the "10x your productivity" threads — all of it points in one direction. Nobody builds a following by telling you to slow down.

So here's my credibility pitch: I use AI agents for about 95% of my development work. I've shipped features, caught security vulnerabilities, and managed entire sprint cycles with AI tooling that most people posting about it haven't opened. And I'm telling you there are things I won't use it for — not because I'm hedging, but because I've pushed the tool far enough to know where it breaks.

I make that decision a dozen times a week, and most of the time I don't even notice I'm making it. That's not instinct — it's pattern recognition built from doing this work every day. The judgment becomes automatic. And that judgment, not the tooling itself, is the actual skill.

You can't water a seed that doesn't exist

I've tried using AI to generate ideas from scratch. Not refine an idea. Not pressure-test a concept. Generate one — from nothing.

It doesn't work.

AI is extraordinary at expanding, refining, challenging, and structuring ideas. Hand it a rough concept and it'll find angles you missed, surface contradictions, and help you think through implications faster than you could alone. But it needs raw material. Something rough, something human, something that came from your context and your pattern recognition. Without that, you get the most statistically average version of whatever you asked for.

The seed has to be yours. AI is an amplifier. Without a signal, it amplifies noise.

Every project I've shipped started with a human idea — scribbled in Excalidraw, talked through with a friend, or captured in a voice memo at 2am. These are the same 'scaffolding' patterns I’ve used to manage state-loss in my own brain; the AI pipeline just turns that scaffolding into working software But the pipeline needs an input. If you skip the human part, you get sophisticated mediocrity — technically correct, architecturally sound, and completely devoid of the insight that would have made it worth building.

Dropdown fields for grief

There's a scene in Leviathan Wakes — the novel that became The Expanse — where Detective Miller has to write a condolence letter. The system gives him a form:

To the [husband / wife / mother / father] of [victim name]. We are sorry to inform you that [he / she] was killed aboard [ship / station] on [date]. Please accept our condolences.

Dropdown fields for grief. Efficient. Covers all the cases. Soulless.

That's what happens when you fully automate emotional communication. And the instinct to reach for AI here is understandable — writing a difficult email is hard, and the blank page is intimidating. But "hard" is exactly the point. The difficulty is the signal that a human needs to be doing this.

Where AI can help with emotional communication is in the middle of the process, not at the beginning or end. You write the first draft — the messy, human, probably-too-long version that says what you actually mean. Then you run it through AI for structure: tighten the phrasing, catch the paragraph that buries the point, find the sentence that says two things when it should say one. Then you do a final pass as a human, because the AI's version will be cleaner but might have smoothed away the part that actually mattered.

Start human. Refine with AI. Finish human. Skip any of those steps and you get either a mess or a template — and people can tell the difference.

The compliance line

I'd use AI to manage a Python 2 to Python 3 migration. Identify deprecated patterns, rewrite syntax, flag compatibility issues across a codebase. Bounded, verifiable, and the cost of a missed edge case is a failing test, not a breach. (It still needs human review — even if you use an adversarial agent for code review, the human makes the final call.)

I would not use AI to rotate secrets.

I would not upload a CSV of client data to an LLM and ask it to generate invoices. Not because the model can't do the math — because a hallucinated line item creates a compliance violation and a client who will never trust you again. The financial services sector is already grappling with this — inaccurate AI outputs in regulated environments don't just create errors, they create regulatory exposure. Invoicing requires auditability, and "the AI did it" is not a line item your accountant can reconcile.

I would not feed PII into a public AI system. Full stop. This isn't about whether the model will get the answer right — it's about what happens to that data after it leaves your system. LLMs can memorize and regurgitate fragments of their training data, and unless you're on an enterprise plan with contractual guarantees about data handling, your client's personally identifiable information is potentially entering a training pipeline you don't control and can't audit. That's not an AI problem. That's a data governance problem, and it exists whether the output is correct or not.

The line isn't about capability. Modern models can do all of these things technically. The line is about what happens when they're wrong — and, in the case of PII, what happens even when they're right. A botched Python migration produces a failing test suite. A botched secret rotation produces a security incident. A hallucinated invoice produces a compliance violation. Client data in a training pipeline produces a breach of trust that no output quality can justify.

And these aren't edge cases waiting to be patched. Hallucinations are an inherent property of how language models work — they predict the most statistically likely next token, not the most factually correct one. That gap doesn't close with better prompts. It closes with governance, verification, and human oversight. Treating hallucinations as bugs to be fixed is how organizations build false confidence in systems that need guardrails.

The rule: if the cost of a wrong answer exceeds the cost of doing it manually, the AI shouldn't be doing it unsupervised. "Probably right" is fine for code review. It's not fine for anything where "probably" means "we might get sued."

This is the same principle behind human-in-the-loop design — and behind my own workflow. The AI generates. The human executes. Not because the AI can't execute — because the gap between "can" and "should" is exactly where the expensive mistakes live.

Voice is collaboration, not delegation

Every post on this site started as something I wrote. AI expanded it, tightened the structure, caught weak arguments, and helped me think through what I actually meant. But the voice is mine. The opinions are mine. The experiences are mine.

If you hand an AI "write me an article about quantum mechanics," you'll get the most average article about quantum mechanics that has ever existed. Not wrong. Not interesting. Think of it as convergence to the mean — the model produces the statistical center of everything it's seen on that topic, and the statistical center of anything is, by definition, unremarkable. It's the same reason every AI-generated LinkedIn post sounds like every other AI-generated LinkedIn post.

And this isn't just an aesthetic problem. GenAI is designed to provide the most likely output, which means it defaults to confident, well-structured prose even when the thinking behind it is shallow. Readers trust polished writing more than they should. The result is content that sounds more authoritative than it deserves to be — and that false authority is its own kind of hallucination.

Voice requires the same pattern as emotional communication: start human, refine with AI, finish human. The AI needs to know what you sound like, what you care about, what hills you'll die on. That context doesn't come from a single prompt — it comes from governance documents that encode your standards, your patterns, your constraints.

It comes from working with the tool long enough that you know its blind spots.

The distinction matters because the audience can always tell. "AI-generated content" and "AI-assisted content" are not the same thing. One reads like a template. The other reads like a person who had help organizing their thoughts.

Three questions before you automate

Before I hand any task to an AI agent, I ask three questions:

Can I verify the output? If I can check the work faster than I can do the work, AI is a net win. If verification requires as much expertise and time as the original task, I've added a step without saving anything.

Is the cost of a wrong answer low? Code review that misses something means I catch it later. A billing error means a client relationship is damaged. A compliance failure means lawyers. Match the automation level to the stakes.

Does sufficient context exist in the system? AI works when the governance documents provide enough structure for a correct first-pass implementation. If the context is ambiguous, incomplete, or doesn't exist yet — the agent will fill in the gaps with confident guesses, and you won't always catch them.

If any answer is "no," the task stays manual. Not forever — sometimes the fix is building the context that makes automation safe. But automating a task that fails these checks isn't efficiency. It's introducing risk and calling it productivity.

Where it works

This isn't an anti-AI post. My entire workflow depends on AI tooling. Well-bounded transformation work, adversarial code review against defined standards, any task where governance documents provide sufficient context for a correct first pass — these are places where AI genuinely accelerates. And once the seed exists, AI is the best thinking partner most people have ever had access to. It doesn't get tired, doesn't get defensive, and will argue the other side of any position if you ask it to.

Tool selection is the expertise

A good chef knows when to use the food processor and when to use the knife. The processor is faster. The knife gives you control. Using the wrong one in the wrong place doesn't make you efficient — it makes you someone who doesn't understand their kitchen.

AI is the most powerful tool most of us have ever had access to. That makes knowing when not to use it more important, not less. The capability is not the question. The judgment is.

If your AI strategy is "use AI for everything," you don't have a strategy. You have enthusiasm. And enthusiasm without judgment is how you end up with dropdown fields for grief.

Sources:
LLM Hallucinations: What Are the Implications for Financial Institutions? — BizTech Magazine

LLM Data Privacy: Risks, Challenges & Best Practices — Lasso Security

AI Hallucinations, RAG and Human-in-Loop Risk Mitigation — DataNucleus

What Is Human-in-the-Loop? — IBM

What Are AI Hallucinations? — PwC