Scale by Subtraction: An Engineering Leader’s View on Practical AI

#practicalai #llmops #architecture #engineeringleadershi

The noise around AI right now is deafening. Every day there is a new model, a new framework, and a new promise that “everything has changed.”

I’ve worn many hats in my career — from Senior Engineer to Architect, and now as a Group Engineering Manager. Across all those roles, my philosophy has remained the same: Scale by Subtraction.

Real scalability doesn’t come from adding more tools, more complexity, or more “magic.” It comes from stripping away the noise to focus on the signal. When we apply this to AI, the conversation changes. We stop talking about “AI Magic” and start talking about engineering trade-offs.

Here are four areas where I see teams getting lost in the hype, and how we can apply “Scale by Subtraction” to build systems that actually work.

1. Context Engineering vs. The “Infinite Window” Trap

The Hype: Context windows are massive now (1M+ tokens). RAG (Retrieval Augmented Generation) is dead. Just dump your entire documentation, codebase, and logs into the prompt and let the model figure it out.

The Reality:

Relying on massive context windows is often lazy engineering. If I dump the entire Wikipedia into a model just to find a specific caching strategy, I am introducing massive noise. The model might hallucinate connections that aren’t there, latency spikes, and costs explode. The quality of the output depends strictly on the quality of the input.

The Strategy (Scale by Subtraction):

We are seeing a false dichotomy between “Long Context” and “Complex RAG Pipelines.”

Building your own end-to-end RAG pipeline (managing vector databases, re-ranking logic, index synchronization) is undifferentiated heavy lifting. It’s tough to maintain. However, “Scale by Subtraction” doesn’t mean we abandon retrieval; it means we subtract the infrastructure, not the context.

I prefer consuming RAG as a managed service (via Azure, Google, etc.) rather than building it from scratch.

But I insist on Context Engineering.

We must curate the input. We ensure the model receives only the specific data it needs to answer the question. Don’t use long context windows as a crutch for bad data curation.

2. The “Lazy Tax” vs. Critical Systems

The Hype: You must optimize everything. Use model routers and fine-tune small models (SLMs) for every micro-task to save money.

The Reality:

Premature optimization is still the root of all evil. If I am writing a blog post or fixing my own performance review, I am happy to pay the “Lazy Tax.” I will use the biggest, smartest, most generic model available. My time is more expensive than the compute tokens.

The Strategy (Scale by Subtraction):

The calculation changes completely when we move from Personal Productivity to Critical Systems.

Take a Live Site Incident. If we are building a troubleshooting agent, I cannot afford “Generic AI.” I cannot just dump all troubleshooting guides into a context window and hope for the best. That is irresponsible. Generic AI guesses: it hallucinates plausible sounding but wrong commands.

In this scenario, we subtract the Variance.

We build a system that acts in a controlled fashion. We restrict the context to only the specific error logs and the exact relevant documentation. For high-stakes systems, we don’t want a generalist; we want a specialist with a narrow, controlled scope.

3. Security First: The Analyst vs. The Judge

The Hype: Autonomous Agents are the future. Give the AI your API keys, let it read your logs, and let it fix the production bug while you sleep.

The Reality:

Giving an LLM “superpowers” (unchecked execute permissions) is not innovation; it is a security vulnerability. LLMs are probabilistic engines. They make mistakes. If an AI hallucinates a command like DROP TABLE or restarts the wrong cluster, the speed of automation becomes the speed of destruction.

The Strategy (Scale by Subtraction):

Security First.

We subtract the autonomy but keep the intelligence.

I view the AI as a high-speed Analyst, not a decision-maker.

AI Analyzes: It scans the logs and pinpoints the issue faster than a human can.
AI Proposes: It says, “This is the issue. I recommend restarting Service X. Shall I proceed?”
Human Decides: I review the logic. If it makes sense, I click “Yes.”

The AI is the Investigator. The Human is the Judge and Executioner. We do not give the system “superpowers” to act alone; we give it the power to inform. The “Human in the Loop” remains the ultimate firewall.

4. Moving Beyond Vanity Metrics

The Hype: Just let the AI write your unit tests. It can generate 100% code coverage in seconds. If the output looks good (the ‘Vibe Check’), ship it.

The Reality:

AI is excellent at gaming metrics (Goodhart’s Law). If I ask for code coverage, it will give me tests that run but assert nothing meaningful. If I ask for a blog post, it will give me perfect grammar but zero insight. Relying on “Vibe Checks” creates a codebase that is technically “tested” but functionally fragile.

The Strategy (Scale by Subtraction):

The Human in the Loop isn’t just a reviewer; they are the Coach.

I don’t just let the AI “generate code.” We must first subtract the Vanity Metrics.

Bad Goal: “Write tests to get 80% coverage.”
Good Goal: “Write tests that specifically validate edge cases X, Y, and Z.”

My role shifts from typing the syntax to defining the acceptance criteria. We explain the goal, safeguard the output, and ensure the AI isn’t just filling space. We subtract the manual labor of writing boilerplate, but we never subtract the responsibility of defining what “quality” actually means.

The Bottom Line

AI is powerful, but it is not a replacement for engineering discipline. As leaders, our job is to subtract the hype and focus on the architecture. We use AI to remove toil, not to remove thinking.