DEV Community

Cover image for What hampers IT Project executions
Sambath Kumar Natarajan
Sambath Kumar Natarajan

Posted on

What hampers IT Project executions

I have published details at https://execution-ledger.vercel.app.
I’ve spent over 18 years building systems. This is what I learned studying the ones that failed. It’s not about code quality. It’s about decisions made 6–12 months before the failure.
The Failures Fall Into Seven Categories

  1. Execution Model Mismatch The mistake: Reality: Customer contracts require fixed scope Result: Permanent scope creep, client rage, revert to waterfall 18 months later Case: Manufacturing software company. Fixed-price customer contracts + Scrum = disaster. They couldn’t deliver features within sprint because scope kept changing. When they tried to cap scope, customers said “that wasn’t in the original agreement.” Took them 2 years to realize Scrum doesn’t work with fixed-scope business model. The better approach:
  2. If you sell fixed scope → use waterfall or hybrid (plan-driven + iterative delivery)
  3. If you sell outcomes (SaaS, platform) → use Agile
  4. If you have contracts where scope changes mid-project → use rolling wave + time/materials
  5. If you’re building AI products → use experimentation-driven (test + learn rapidly) The framework matters. The business model determines the framework.
  6. Estimation Theater The mistake: You choose: Story points / Function points / T-shirt sizing / AI estimation Everyone feels confident 6 months later: Velocity dropped 40%, you’ve hit integration hell Why every method fails: They all conflate three different things:
  7. Effort: How many engineer-hours?
  8. Complexity: How many unknowns?
  9. Execution risk: What could go wrong? Example: Microservices migration. Service 1–5: Estimated at 13 story points each Actual: Crushed them in 3 months Service 6–10: Same 13-point estimate Actual: Took 6 months (integration complexity) Service 11–25: Still 13-point estimate Actual: Each took 3–4 months (cascading API dependencies) The effort was consistent. The complexity exploded. The better approach: Estimate separately:
  10. Effort: 120 engineer-hours
  11. Complexity: Low (5 unknowns), Medium (15 unknowns), High (40+ unknowns)
  12. Risk: Technical risk, integration risk, vendor risk, talent risk
  13. Contingency: Add 30–50% buffer based on complexity/risk Then commit to the effort, not the timeline.
  14. Testing Confidence The mistake: Team metric: 80% code coverage + shift-left testing + AI test generation Production metric: Same incident rate as 2 years ago Why metrics lie:
  15. Code coverage measures “did code run?” not “does it work?”
  16. Shift-left testing optimizes for happy paths; production is 99% edge cases
  17. AI-generated tests have the same blind spots as human tests (wrong data, wrong scale) Case: Retail company. 85% automation coverage. Tests pristine. Black Friday:
  18. Tests ran on clean data with stable traffic
  19. Real traffic: 100x spike, cache invalidation, connection pool exhaustion
  20. Production down for 6 hours
  21. Cost: $5M in lost revenue The better approach: Stop asking “does it work?” Start asking “when does it break?”
# Instead of this:
def test_user_creation():
user = create_user(john@example.com, password123)
assert user.id is not None
# Do this:
def test_user_creation_under_load():
# What breaks at 1000 req/sec?
load_test(create_user, requests=1000)
def test_user_creation_with_db_slow():
# What if DB is slow?
slow_db_connection()
user = create_user()
assert user.id is not None # or does it timeout?
def test_user_creation_with_concurrent_writes():
# What if duplicate emails hit simultaneously?
concurrent(lambda: create_user(john@example.com, pass))
assert no_duplicates()
This is chaos engineering. Failure injection. Production validation.
Teams that do this have 70% fewer incidents.
4. AI-Generated Code Debt
The mistake:
Month 1: Were using Copilot! Velocity +30%!
Month 6: Code reviews impossible, bugs doubled, refactoring nightmare
Month 9: Codebase cleanup takes 2 months. Productivity goes negative.
Why it happens:
Enter fullscreen mode Exit fullscreen mode


python

Copilot generates:

def process_orders(orders):
results = []
for order in orders:
price = order[‘price’] * order[‘quantity’]
if price > 1000:
price = price * 0.9 # 10% discount
results.append({‘order_id’: order[‘id’], ‘total’: price})
return results
This code:
— Works immediately (happy path)
— Is 30% faster to write than thinking through design
— Has hidden bugs: no error handling, no type checking, duplicates discount logic elsewhere
— Creates technical debt immediately (now you have 47 places doing discounts)
The better approach:
Use AI for:

  • Low-stakes code (utilities, boilerplate, tests)
  • Well-defined problems (implement spec, not design spec)
  • Repetitive patterns (CRUD endpoints, error handlers) Don’t use AI for:
  • Decision-making code (architecture, core algorithms)
  • Novel problems (nobody’s solved this before)
  • Critical paths (security, payment processing, data integrity) Measure:
  • Code quality metrics (cyclomatic complexity, test coverage)
  • Bug density (bugs per 1000 LOC)
  • Maintenance cost (refactoring hours per quarter) Become a member Not just velocity.
  • Observability Theater The mistake: Team collects: 10,000 metrics Team actually uses: 3 metrics (grep logs for everything else) Why engineers ignore observability: # You built dashboards like this: Dashboard: “System Health”
  • CPU: 45%
  • Memory: 62%
  • Network: 12%
  • Latency p99: 245ms
  • Requests/sec: 5,200
  • Error rate: 0.12%
  • (100 more metrics) Engineer needs to debug: “Users are complaining requests are slow” Engineer does: tail -f /var/log/app.log | grep “slow” (finds the problem in 2 minutes) Because the dashboard tells her “all systems nominal” But log shows: “database query took 45 seconds” The better approach: Design observability for decisions, not data collection. Ask: What does the CTO need to know to make a decision? Answer: — Is this an outage? (binary: yes/no) — How many users affected? (number) — What’s broken? (service name, error type) — What’s the blast radius? (cascading failures?) — Can we roll back? (undo last deploy?) Build dashboards for those 5 things. Everything else is optional. Case: Media company built “state-of-the-art” observability. Alert fatigue killed adoption. 45-minute outage. Alerts fired. Nobody saw them. Cost: $2M.
  • Organization Transformation Gravity The mistake: You modernize the tech stack:
  • Microservices? ✓
  • DevOps? ✓
  • Cloud? ✓ But you keep the 1995 org structure:
  • Approval process: 12 months
  • Hiring: 3 months per person
  • Team structure: Silos by function Result: DevOps team can deploy in 5 minutes but must wait 12 months for approval Case: Telecom company. Modernized to DevOps + microservices. Hired fancy architects. Built beautiful infrastructure. But:
  • Feature requests went through legacy billing system approval
  • Billing system was sold as “modules” (no flexibility)
  • Customer contracts were 12-month fixed scope
  • Agile dev team clashed with fixed-scope requirements 3 years later: $50M spent, zero improvement in time-to-market, three leadership changes. The better approach: Transform the org first. Tech follows, not the reverse. Ask:
  • How fast can we make decisions? (1 week? 1 month? 1 quarter?)
  • How much autonomy do teams have? (full? subject to approval?)
  • How aligned are incentives? (shipping fast? cost control? risk aversion?)
  • Can we move people between teams? (reorg cost? retention risk?) Legacy gravity is stronger than new technology. You can’t microservices your way out of broken incentives.
  • Vendor Lock-In Optimized as Innovation The mistake: You choose: “Leading vendor, industry-standard, great case studies” Vendor optimizes for: Lock-in (proprietary APIs, custom languages, switching cost) Vendor roadmap driven by: Highest-paying customers (usually not you) Vendor gets acquired by: New owner who kills the product line Result: You’re stuck (switching cost: $5M+) or forced to rewrite Case: Fintech company chose a vendor for “core platform.” Vendor’s messaging:
  • “Built for fintech”
  • “Enterprise-grade”
  • “Zero downtime deployments” 3 years in:
  • Vendor acquired
  • New owner killed product line
  • Migration required complete rewrite
  • Cost: $8M, 18 months, three departures The better approach: Assume vendors will disappoint.
  • Use open standards (SQL, REST, standard frameworks)
  • Own critical data flows (never let vendor own your data)
  • Keep switching costs low (avoid proprietary APIs)
  • Plan the exit (what would it cost to migrate?)
  • Diversify risk (use multiple vendors where possible) The Decision Framework That Actually Works Step 1: Name the decision (Vendor, execution model, tech stack, scaling, migration strategy, etc.) Step 2: Model primary consequences (6–18 months)
  • How much engineering effort?
  • How much operational burden?
  • Learning curve?
  • Cost trajectory? Step 3: Model secondary consequences (18–36 months)
  • Migration cost (how hard to undo?)
  • Vendor risk (acquisition? shutdown?)
  • Talent risk (hiring/retention?)
  • Organizational risk (cultural change needed?) Step 4: Model tertiary consequences (3+ years)
  • Lock-in (are we stuck forever?)
  • Scaling limits (where does this break?)
  • Obsolescence (outdated in 5 years?)
  • Opportunity cost (what can’t we do?) Step 5: Decide transparently You’re not choosing “best.” You’re choosing “best trade-off given constraints.” Document it. Six months from now, you’ll need to remember why you made this call. Test Your Own Judgment I built a simple simulator that walks you through this framework in 2 minutes. You make architecture decisions. https://execution-ledger.vercel.app No login. No signup. Just test yourself.

Top comments (0)