I have published details at https://execution-ledger.vercel.app.
I’ve spent over 18 years building systems. This is what I learned studying the ones that failed. It’s not about code quality. It’s about decisions made 6–12 months before the failure.
The Failures Fall Into Seven Categories
- Execution Model Mismatch The mistake: Reality: Customer contracts require fixed scope Result: Permanent scope creep, client rage, revert to waterfall 18 months later Case: Manufacturing software company. Fixed-price customer contracts + Scrum = disaster. They couldn’t deliver features within sprint because scope kept changing. When they tried to cap scope, customers said “that wasn’t in the original agreement.” Took them 2 years to realize Scrum doesn’t work with fixed-scope business model. The better approach:
- If you sell fixed scope → use waterfall or hybrid (plan-driven + iterative delivery)
- If you sell outcomes (SaaS, platform) → use Agile
- If you have contracts where scope changes mid-project → use rolling wave + time/materials
- If you’re building AI products → use experimentation-driven (test + learn rapidly) The framework matters. The business model determines the framework.
- Estimation Theater The mistake: You choose: Story points / Function points / T-shirt sizing / AI estimation Everyone feels confident 6 months later: Velocity dropped 40%, you’ve hit integration hell Why every method fails: They all conflate three different things:
- Effort: How many engineer-hours?
- Complexity: How many unknowns?
- Execution risk: What could go wrong? Example: Microservices migration. Service 1–5: Estimated at 13 story points each Actual: Crushed them in 3 months Service 6–10: Same 13-point estimate Actual: Took 6 months (integration complexity) Service 11–25: Still 13-point estimate Actual: Each took 3–4 months (cascading API dependencies) The effort was consistent. The complexity exploded. The better approach: Estimate separately:
- Effort: 120 engineer-hours
- Complexity: Low (5 unknowns), Medium (15 unknowns), High (40+ unknowns)
- Risk: Technical risk, integration risk, vendor risk, talent risk
- Contingency: Add 30–50% buffer based on complexity/risk Then commit to the effort, not the timeline.
- Testing Confidence The mistake: Team metric: 80% code coverage + shift-left testing + AI test generation Production metric: Same incident rate as 2 years ago Why metrics lie:
- Code coverage measures “did code run?” not “does it work?”
- Shift-left testing optimizes for happy paths; production is 99% edge cases
- AI-generated tests have the same blind spots as human tests (wrong data, wrong scale) Case: Retail company. 85% automation coverage. Tests pristine. Black Friday:
- Tests ran on clean data with stable traffic
- Real traffic: 100x spike, cache invalidation, connection pool exhaustion
- Production down for 6 hours
- Cost: $5M in lost revenue The better approach: Stop asking “does it work?” Start asking “when does it break?”
# Instead of this:
def test_user_creation():
user = create_user(“john@example.com”, “password123”)
assert user.id is not None
# Do this:
def test_user_creation_under_load():
# What breaks at 1000 req/sec?
load_test(create_user, requests=1000)
def test_user_creation_with_db_slow():
# What if DB is slow?
slow_db_connection()
user = create_user(…)
assert user.id is not None # or does it timeout?
def test_user_creation_with_concurrent_writes():
# What if duplicate emails hit simultaneously?
concurrent(lambda: create_user(“john@example.com”, “pass”))
assert no_duplicates()
This is chaos engineering. Failure injection. Production validation.
Teams that do this have 70% fewer incidents.
4. AI-Generated Code Debt
The mistake:
Month 1: We’re using Copilot! Velocity +30%!
Month 6: Code reviews impossible, bugs doubled, refactoring nightmare
Month 9: Codebase cleanup takes 2 months. Productivity goes negative.
Why it happens:
python
Copilot generates:
def process_orders(orders):
results = []
for order in orders:
price = order[‘price’] * order[‘quantity’]
if price > 1000:
price = price * 0.9 # 10% discount
results.append({‘order_id’: order[‘id’], ‘total’: price})
return results
This code:
— Works immediately (happy path)
— Is 30% faster to write than thinking through design
— Has hidden bugs: no error handling, no type checking, duplicates discount logic elsewhere
— Creates technical debt immediately (now you have 47 places doing discounts)
The better approach:
Use AI for:
- Low-stakes code (utilities, boilerplate, tests)
- Well-defined problems (implement spec, not design spec)
- Repetitive patterns (CRUD endpoints, error handlers) Don’t use AI for:
- Decision-making code (architecture, core algorithms)
- Novel problems (nobody’s solved this before)
- Critical paths (security, payment processing, data integrity) Measure:
- Code quality metrics (cyclomatic complexity, test coverage)
- Bug density (bugs per 1000 LOC)
- Maintenance cost (refactoring hours per quarter) Become a member Not just velocity.
- Observability Theater The mistake: Team collects: 10,000 metrics Team actually uses: 3 metrics (grep logs for everything else) Why engineers ignore observability: # You built dashboards like this: Dashboard: “System Health”
- CPU: 45%
- Memory: 62%
- Network: 12%
- Latency p99: 245ms
- Requests/sec: 5,200
- Error rate: 0.12%
- (100 more metrics) Engineer needs to debug: “Users are complaining requests are slow” Engineer does: tail -f /var/log/app.log | grep “slow” (finds the problem in 2 minutes) Because the dashboard tells her “all systems nominal” But log shows: “database query took 45 seconds” The better approach: Design observability for decisions, not data collection. Ask: What does the CTO need to know to make a decision? Answer: — Is this an outage? (binary: yes/no) — How many users affected? (number) — What’s broken? (service name, error type) — What’s the blast radius? (cascading failures?) — Can we roll back? (undo last deploy?) Build dashboards for those 5 things. Everything else is optional. Case: Media company built “state-of-the-art” observability. Alert fatigue killed adoption. 45-minute outage. Alerts fired. Nobody saw them. Cost: $2M.
- Organization Transformation Gravity The mistake: You modernize the tech stack:
- Microservices? ✓
- DevOps? ✓
- Cloud? ✓ But you keep the 1995 org structure:
- Approval process: 12 months
- Hiring: 3 months per person
- Team structure: Silos by function Result: DevOps team can deploy in 5 minutes but must wait 12 months for approval Case: Telecom company. Modernized to DevOps + microservices. Hired fancy architects. Built beautiful infrastructure. But:
- Feature requests went through legacy billing system approval
- Billing system was sold as “modules” (no flexibility)
- Customer contracts were 12-month fixed scope
- Agile dev team clashed with fixed-scope requirements 3 years later: $50M spent, zero improvement in time-to-market, three leadership changes. The better approach: Transform the org first. Tech follows, not the reverse. Ask:
- How fast can we make decisions? (1 week? 1 month? 1 quarter?)
- How much autonomy do teams have? (full? subject to approval?)
- How aligned are incentives? (shipping fast? cost control? risk aversion?)
- Can we move people between teams? (reorg cost? retention risk?) Legacy gravity is stronger than new technology. You can’t microservices your way out of broken incentives.
- Vendor Lock-In Optimized as Innovation The mistake: You choose: “Leading vendor, industry-standard, great case studies” Vendor optimizes for: Lock-in (proprietary APIs, custom languages, switching cost) Vendor roadmap driven by: Highest-paying customers (usually not you) Vendor gets acquired by: New owner who kills the product line Result: You’re stuck (switching cost: $5M+) or forced to rewrite Case: Fintech company chose a vendor for “core platform.” Vendor’s messaging:
- “Built for fintech”
- “Enterprise-grade”
- “Zero downtime deployments” 3 years in:
- Vendor acquired
- New owner killed product line
- Migration required complete rewrite
- Cost: $8M, 18 months, three departures The better approach: Assume vendors will disappoint.
- Use open standards (SQL, REST, standard frameworks)
- Own critical data flows (never let vendor own your data)
- Keep switching costs low (avoid proprietary APIs)
- Plan the exit (what would it cost to migrate?)
- Diversify risk (use multiple vendors where possible) The Decision Framework That Actually Works Step 1: Name the decision (Vendor, execution model, tech stack, scaling, migration strategy, etc.) Step 2: Model primary consequences (6–18 months)
- How much engineering effort?
- How much operational burden?
- Learning curve?
- Cost trajectory? Step 3: Model secondary consequences (18–36 months)
- Migration cost (how hard to undo?)
- Vendor risk (acquisition? shutdown?)
- Talent risk (hiring/retention?)
- Organizational risk (cultural change needed?) Step 4: Model tertiary consequences (3+ years)
- Lock-in (are we stuck forever?)
- Scaling limits (where does this break?)
- Obsolescence (outdated in 5 years?)
- Opportunity cost (what can’t we do?) Step 5: Decide transparently You’re not choosing “best.” You’re choosing “best trade-off given constraints.” Document it. Six months from now, you’ll need to remember why you made this call. Test Your Own Judgment I built a simple simulator that walks you through this framework in 2 minutes. You make architecture decisions. https://execution-ledger.vercel.app No login. No signup. Just test yourself.
Top comments (0)