Paulo Victor Leite Lima Gomes

Posted on May 5

Tactical Debt Is the Silent Killer of Engineering Velocity

#ai #softwareengineering #opinion #devops

There is a kind of engineering debt that does not show up in static analysis, does not trigger your test suite, and will not be fixed by refactoring a bad class.

It lives in the way the team operates.

Nobody owns the service. Deployments need a human ritual. The sprint plan dies on Tuesday because production is on fire again. The one person who understands payments is on vacation and suddenly everyone is politely pretending not to panic.

This is tactical debt.

And I think it destroys more engineering velocity than most teams are willing to admit.

Technical debt gets all the attention because it is visible to engineers. You can point to the messy module. You can complain about the missing tests. You can open an IDE and feel the pain directly.

Tactical debt is sneakier. It is the boulder chained to the team while everyone is still debating whether the codebase is clean enough.

The cruel part is that tactical debt absorbs productivity gains.

You can improve your test suite. You can adopt better tooling. You can hire strong engineers. You can use AI to write code faster.

But if the path from idea to production is full of unclear ownership, manual steps, approval mazes, status theater, and tribal knowledge, the extra productivity just gets converted into waiting.

AI may help you write the pull request in twenty minutes.

Tactical debt makes sure it still takes three weeks to reach production.

tactical debt is not technical debt

This distinction matters.

Technical debt is about the shape and quality of the software system. Tactical debt is about the shape and quality of the operating system around the software system.

They are related, but they are not the same thing.

You can have ugly code and a high-velocity team if ownership is clear, deployment is automated, decisions are fast, and operational feedback loops are healthy.

You can also have a beautifully designed codebase trapped inside an organization where every change requires five meetings, three approvals, a release coordinator, and someone named Rodrigo who is the only person allowed to touch production because of an incident in 2021.

The code may be clean.

The team is still slow.

This is why some technical-debt programs fail. Teams spend a quarter improving internals and then wonder why delivery still feels stuck. The answer is uncomfortable: the bottleneck was never only in the code.

The bottleneck was in the way work moves.

Technical debt usually asks: “Is the system easy to change?”

Tactical debt asks: “Can the team actually get a change safely into the hands of users?”

Those are different questions. Mature engineering organizations need to care about both.

unclear ownership turns small questions into archaeology

A healthy system has an answer to a boring question: who owns this?

Not in an abstract Confluence sense. In the practical sense.

Who gets paged? Who approves risky changes? Who understands the business constraints? Who can decide whether this API behavior is intentional or accidental?

When ownership is unclear, simple work becomes detective work.

A product manager asks for a small change in a service. The engineer opens the repository, checks the README, sees that it was last meaningfully updated two years ago, asks in Slack, gets three conflicting answers, and eventually discovers that the service was built by a team that no longer exists.

Now the ticket is not “change this behavior.”

The ticket is “perform organizational archaeology until someone is brave enough to say yes.”

This is tactical debt.

And it is expensive because it taxes every future change, not just the first one.

manual work is latency disguised as caution

Manual work often enters the system with good intentions.

A production deploy needed care. A migration was risky. A release checklist caught one important mistake once. So the team kept the ritual.

Then the ritual became normal.

Now a deployment requires someone to click through a checklist at midnight, update a spreadsheet, paste output into a channel, wait for a human approval, run a script from their laptop, and hope the VPN behaves.

Everyone agrees this is “safer.”

Often it is not.

Manual work is inconsistent automation executed by tired humans.

It adds latency, creates hidden dependencies, and makes releases emotionally expensive. When deployment hurts, teams batch changes. When teams batch changes, releases get riskier. When releases get riskier, the organization adds more manual control.

The loop feeds itself.

Good automation is not about moving fast recklessly. It is about making the safe path the easy path.

If shipping requires heroics, velocity will eventually collapse into negotiation.

status meetings are usually a symptom

I am not anti-meeting. Some meetings are useful. Talking to humans is not a failure mode.

But many status meetings are not coordination. They are compensation.

They exist because the actual system for communicating progress does not work.

The board is stale. The tickets are vague. Decisions are buried in private chats. Dependencies are invisible. Nobody trusts async updates. So the organization creates a recurring ceremony where everyone verbally reconstructs reality.

This is synchronous theater.

It feels productive because people are talking. But the output is often just temporary shared awareness that expires after lunch.

The fix is rarely “cancel all meetings.” The fix is making the work legible.

Clear ownership. Written decisions. Useful tickets. Visible dependencies. Async updates that people actually trust.

Once the system is legible, meetings can return to what they are good for: judgment, conflict, alignment, and decisions.

Not reading the Jira board out loud.

hero dependencies are not a compliment

Every company has the person.

The one who understands payments. Or Kubernetes. Or the pricing engine. Or why settlement fails on the last business day of the month in one specific market.

At first, this person looks like an asset. They are fast, helpful, reliable, and terrifyingly knowledgeable.

Then they go on vacation.

Suddenly the whole organization discovers that “we have documentation” meant “we have a Slack thread from March and maybe a diagram in someone’s Google Drive.”

Hero dependencies are tactical debt with a friendly face.

They feel good because heroes save the day. They are dangerous because the system learns to depend on being saved.

The answer is not to punish strong engineers. The answer is to stop turning competence into a single point of failure.

Pair on risky domains. Rotate operational ownership. Write down the weird parts. Make onboarding to critical systems a real activity. Reward people for making themselves less required, not more indispensable.

A team that cannot survive one engineer taking a holiday is not high-performing.

It is fragile.

reactive fire drills make planning fictional

A sprint plan is a hypothesis.

In some teams, it is also fiction.

Not because engineers are lazy. Not because product is bad at prioritization. But because the organization has accepted constant interruption as normal.

Production is always on fire. Incidents are frequent. Customer escalations bypass prioritization. Leadership asks for “quick checks” that become full projects. Every week starts with a plan and ends with a pile of emergency context switches.

This is one of the most brutal forms of tactical debt because it attacks deep work.

Software engineering needs sustained attention. You cannot design well in twenty-minute fragments between escalations. You cannot reason about distributed systems while half your brain is waiting for the next alert.

Fire drills also create a nasty management illusion: the team looks busy, responsive, and important.

But responsiveness is not the same as progress.

If every week is exceptional, nothing is exceptional. The system is just underdesigned.

Fixing this usually requires boring operational discipline: incident review, error budgets, better alert quality, platform investment, clearer escalation paths, and the courage to say that not every interruption deserves immediate engineering attention.

Velocity is not how fast you react to chaos.

Velocity is how much chaos you no longer create.

poor handoffs are where context goes to die

Bad handoffs are incredibly expensive because they usually happen at boundaries: between product and engineering, engineering and QA, development and operations, one team and another team.

The classic version is “it works on my machine.”

But the more subtle version is worse.

A team finishes a project and throws it over the wall with incomplete context. The receiving team gets code but not the assumptions. A migration plan but not the failure modes. An API contract but not the customer promises. A dashboard but not the meaning of the alerts.

Then everyone acts surprised when the handoff creates rework.

Good handoffs are not ceremonies. They are context transfer.

What changed? Why? What tradeoffs were made? What should the next team watch? What is safe to modify? What is load-bearing and ugly but intentional?

If that information is not transferred, the receiving team pays the tax later.

Usually during an incident.

information silos turn companies into slow databases

Information silos are not just a documentation problem. They are a query problem.

In a healthy organization, an engineer can ask, “Why does this work this way?” and find a reliable answer.

In a siloed organization, the answer exists somewhere, but the retrieval mechanism is social luck.

Maybe it is in a private channel. Maybe it was discussed in a meeting that was never documented. Maybe it lives in the head of someone who moved teams. Maybe it was in an ADR, but the ADR folder has thirteen conflicting versions and no one knows which one matters.

So engineers query the company like a bad distributed database.

They ask around. They wait. They get partial answers. They merge gossip with code reading. Then they make a decision with low confidence.

This is slow, but worse than slow: it makes teams conservative.

When people cannot understand the system, they avoid changing it. When they must change it, they over-escalate. When they over-escalate, the organization adds process. The boulder gets heavier.

Documentation helps, but only if it is part of the work, not a guilt ritual after the work.

A useful rule: if a decision will matter in three months, write it where future engineers will actually look.

excess approvals create responsibility without authority

Approvals are seductive.

They make risk feel managed. They create a trail. They give leadership a sense that important changes are being controlled.

Sometimes approvals are necessary. Regulated systems, financial flows, security-sensitive changes, and irreversible actions need real governance.

But excess approvals are different. They are what happens when an organization does not trust its own operating model.

Instead of giving teams clear guardrails and authority, it inserts humans into every meaningful step.

Architecture review. Security review. Platform review. Product review. Release review. Manager review. Sometimes each one is valid in isolation. Together, they create a system where everyone is responsible and nobody is empowered.

The cost is not only waiting time.

The cost is learned helplessness.

Engineers stop making decisions because decisions will be relitigated anyway. Teams optimize for approval rather than outcomes. Documents become defensive. Meetings become political. Velocity becomes the speed at which consensus can be manufactured.

Good governance should make the safe decisions obvious and the risky decisions explicit.

Bad governance makes every decision slow.

tribal knowledge is a loan with terrible interest

Tribal knowledge is not automatically bad. Every team has local knowledge. Some things are easier to learn from people than from documents.

The problem starts when tribal knowledge becomes the primary storage layer for critical information.

Why do we deploy on Thursdays? Ask Mariana.

Why does this job run twice? Ask Ahmed.

Why is this field nullable? Ask the person who left last year.

At that point, the organization is not saving time by avoiding documentation. It is taking a loan.

The interest is paid by every future engineer who has to rediscover the same context.

AI makes this more interesting, and not always in a good way.

Generated code can move faster than organizational memory. If your team cannot explain why the system works the way it does, AI will happily produce changes that look reasonable and violate assumptions nobody wrote down.

The problem is not that AI is bad.

The problem is that AI amplifies the quality of the surrounding system.

If the surrounding system is full of tribal knowledge, unclear ownership, and manual release paths, AI will not save velocity. It will generate more work waiting to get stuck.

why tactical debt compounds

The dangerous thing about tactical debt is that each piece reinforces the others.

Unclear ownership creates more meetings.

Manual deployments require more approvals.

Hero dependencies create poor handoffs.

Poor handoffs create fire drills.

Fire drills prevent documentation.

Missing documentation strengthens tribal knowledge.

Tribal knowledge makes ownership harder to clarify.

And around we go.

This is why tactical debt feels normal from the inside. It does not arrive as one catastrophic decision. It accumulates as reasonable local compromises.

One manual step.

One exception process.

One undocumented decision.

One “let’s just ask Ana.”

One urgent escalation that bypasses the roadmap.

None of these feel fatal. Together, they become the operating system of the team.

Then leadership asks why engineering velocity is down.

The answer is chained to the team in plain sight.

the fix is operational design

You do not fix tactical debt by telling engineers to “move faster.”

You fix it by redesigning how work flows.

A few questions are more useful than most maturity models:

Can we name the owner of every production service?
Can a normal change reach production without heroics?
Can someone understand the current state of work without attending a meeting?
Can critical people take vacation without the team freezing?
Do incidents produce system improvements, or just exhaustion?
Are decisions written where future engineers will find them?
Are approvals protecting real risk, or compensating for missing trust?
Can a new engineer safely change an important system without a treasure hunt?

If the answer to most of these is no, the problem is not motivation.

It is tactical debt.

And the best teams treat it as engineering work.

They automate the manual path. They clarify ownership. They make decisions searchable. They reduce approval scope. They rotate knowledge. They improve handoffs. They measure interruption. They design communication channels instead of letting them decay into notification soup.

This work is not glamorous. It rarely produces a launch announcement. Nobody gets promoted because the deployment checklist got deleted.

But this is the work that makes every future feature cheaper.

AI does not remove the boulder

This is the part I think many companies are about to learn painfully.

AI can increase code production. It can help write tests, generate scaffolding, explain unfamiliar code, draft migrations, and accelerate repetitive engineering tasks.

That is useful.

But velocity is not code output.

Velocity is validated change delivered safely to users.

If your organization cannot absorb change, faster code generation just creates a larger queue in front of the same bottlenecks.

More pull requests waiting for unclear owners.

More generated changes waiting for manual deployment.

More code paths nobody understands because the original assumptions lived in someone’s head.

More review burden on the same heroes.

More incidents because the system got changed faster than the operating model improved.

AI does not magically fix tactical debt. In many teams, it will expose it.

That is not a reason to avoid AI. It is a reason to stop pretending that engineering productivity is only about typing speed.

The bottleneck was rarely the keyboard.

remove weight before adding horsepower

If a team is chained to a boulder, buying a faster bicycle is not the first-order fix.

You need to remove weight.

Not all at once. Not with a grand transformation program that creates three more steering committees and somehow makes the boulder larger.

Start smaller and sharper.

Pick one painful release process and automate it.

Pick one orphaned service and assign real ownership.

Pick one recurring status meeting and replace it with a written operating rhythm people trust.

Pick one hero dependency and deliberately spread the knowledge.

Pick one approval step and ask what risk it actually controls.

Pick one incident pattern and make it less likely to happen again.

The point is not process minimalism for its own sake. Some process is good. Adults need coordination.

The point is to make the organization lighter.

Because engineering velocity is not only a property of engineers. It is a property of the system engineers work inside.

And tactical debt is what happens when that system quietly gets heavier every month.

Clean code helps.

AI helps.

Better tools help.

But if the boulder stays chained to the team, every productivity gain will be absorbed by the same old drag.

The best engineering organizations I have seen are not the ones with zero debt. That does not exist.

They are the ones that can tell the difference between code that needs refactoring and an operating model that needs repair.

Then they repair both.

References

Martin Fowler, TechnicalDebt
Google SRE Book, Postmortem Culture: Learning from Failure
Accelerate / DORA research, Four Keys metrics
Team Topologies, Team Topologies patterns

DEV Community