DEV Community

Alejandro Sierra
Alejandro Sierra

Posted on

Raising the Bar: Driving Code Health Across 25+ Teams at Full Speed


Many development teams are deadline-driven but not quality-driven.

Releases happen. Features ship.

But few teams can answer:

  • Which quality gates did we pass?
  • What defect detection ratio do we have?
  • What mutation coverage are we running?
  • Do we block a release if tests fail?
  • Are we protecting integration between teams?

Quality is intrinsic to the product. Whether you are Apple, Amazon, or JohnDoeShoes.com ,stakeholders expect stability, reliability and trust.

As a Quality & Delivery Excellence Lead, I led a structured quality transformation across 25+ teams in production environments without impacting delivery velocity.

Phase 1 β€” Analysis: Establishing the Baseline

Before improving anything, we needed visibility.

Most teams believe they know their delivery health. Very few actually measure it.

The first phase was a structured discovery process focused on understanding how teams truly operated not how they thought they operated.

1. SDLC Mapping
How code moved from commit to production

  • What tests were executed and when
  • What environments were used and how stable they were
  • Whether releases were blocked by quality gates or just by deadlines
  • How integration between teams was validated

The goal was not to judge. It was to observe patterns.

2. Defining Baseline KPIs
We established measurable quality indicators:

  • Defect Detection Ratio (DDR) How many defects were detected before production vs total defects.
  • Defect Leakage Ratio What percentage of issues escaped into production. These two metrics immediately reveal the maturity of the testing strategy.

3. Mutation Testing Baseline
Coverage alone is misleading. You can have 85% line coverage and still have weak tests.

So we can introduced mutation testing to evaluate test robustness:

  • Stryker for Front-End
  • PIT for Back-End
    We measured:

  • Mutation coverage

  • Line coverage

  • Number of tested classes

  • Survived mutations
    This gave us a real picture of test strength not just quantity.

The outcome of Phase 1 was simple: We replaced perception with data.

Front-End & Back-End β€” Mutation Testing (Stryker and Pitest)

During Phase 1, Initial mutation score revealed high line coverage but weak mutation resistance indicating superficial test assertions.

STRYKER PHASE 1. Example mutation report<br>

PITEST PHASE 1. Example mutation report


Phase 2 β€” Strategy & Implementation: Building a Sustainable System

The objective was never to increase coverage for one quarter.
The objective was to embed quality into the SDLC permanently.
This phase focused on system design, not isolated fixes.

Strategic Pillars

1. Tooling Standardization
We defined clear tooling by technology stack:

  • Mutation testing per layer (FE/BE)
  • Unit testing frameworks aligned across teams
  • Static analysis tooling defined and respected

Standardization reduces variability and accelerates adoption.

2. Git Hook Governance
To protect the integrity of the codebase before CI even starts, we implemented:

  • Husky templates for Front-End
  • Spotless enforcement for Back-End

These hooks ensured:

  • Code formatting compliance
  • Linting rules
  • Test execution pre-push (when applicable)
  • Prevention of low-quality commits Quality begins before CI/CD.

Pre commit Hooks used with Husky:


`# Check for sensitive data
echo "πŸ” Checking for sensitive data..."
SENSITIVE_PATTERNS="(password|passwd|pass|pwd|secret|token|key|api_key|private_key|auth_token|auth|bearer|basic_auth|credentials|client_secret|access_token|refresh_token|jwt|cookie|session)"
SENSITIVE_FILES=$(git diff --cached --name-only | grep -v -E '^.husky/|^eslint.config.|.env.example$' | xargs grep -l -i -E "$SENSITIVE_PATTERNS" 2>/dev/null || true)

if [ -n "$SENSITIVE_FILES" ]; then
    echo "❌ ALERT: Potential sensitive data detected in:"
    echo "$SENSITIVE_FILES"
    echo "🚫 Commit aborted"
    exit 1
fi
echo "βœ… No sensitive data detected"

# Commented-out code detection
echo "πŸ” Checking for commented-out code..."
COMMENT_FILES=$(git diff --cached --name-only | grep -E '\.(ts|tsx|js|jsx)$')
if [ -n "$COMMENT_FILES" ]; then
    COMMENTED_CODE=$(echo "$COMMENT_FILES" | xargs awk '
    {
        if ($0 ~ /^\s*\/\//) {
            print FILENAME":"NR":"$0" -> Possible commented-out code"
        }
    }' 2>/dev/null)
    if [ -n "$COMMENTED_CODE" ]; then
        echo "⚠️ Commented-out code detected:"
        echo "$COMMENTED_CODE" | head -5
        echo "πŸ’‘ Recommendation: Please remove unnecessary code. If temporary, use // TODO: or // FIXME:"
        exit 1
    fi
fi
echo "βœ… No problematic commented-out code detected"`

Enter fullscreen mode Exit fullscreen mode

3. Defined Quality Gates
We formalized:

  • Minimum mutation thresholds
  • Coverage expectations
  • Release blocking criteria
  • Checkpoints within the delivery cycle

4. Standardized Quality Framework Repositories

Git Hooks Repository (Husky Templates)
Husky Validation: Monorepo PoC for Code Quality

  • Centralized pre-commit and pre-push hooks
  • Formatting, linting and test enforcement
  • GitHub-based reusable templates

Front-End Mutation Framework (Stryker)
Mutation Testing with Stryker and Jest

  • Preconfigured mutation testing setup
  • Defined mutation thresholds
  • Standardized reporting structure

Back-End Mutation Framework (PIT)
Mutation Testing with Pitest in Spring Boot

  • Unified PIT configuration
  • CI-ready execution
  • Consistent mutation score baseline

These repositories acted as accelerators.
Instead of asking teams to β€œimprove quality”, we gave them a ready-to-adopt system.

5. QA as the Change Enabler

Instead of centralizing control, we empowered QA as the transformation driver.

QA received:

  • Formal mutation testing training
  • Centralized repositories:
  • Git hooks templates
  • PIT templates
  • Stryker templates
  • Reporting frameworks

They acted as:

  • Evangelists
  • 1:1 technical mentors
  • Quality auditors
  • Delivery safeguards This shifted QA from reactive testing to proactive quality governance.

And most importantly:

Quality became collaborative, not imposed.

Delivery Improvement in SDLC

Phase 3 β€” Measurement & Tracking: Making Progress Visible

Transformation without tracking is illusion.

We defined recurring measurement cycles aligned to sprints and releases.

We tracked:

  • Defect Detection Ratio (DDR) : DDR (%)=(Defects detected/Total existing defects)Γ—100
  • Defect Leakage Ratio (DLR) : DLR (%)=(Defects found in later phase or production/Total defects detected)Γ—100
  • Mutation Coverage (FE & BE)
  • Line Coverage
  • Survived mutations
  • Class coverage growth
  • Release stability

We established checkpoints to identify:

  • Knowledge silos
  • Teams requiring additional coaching
  • High-performing teams to use as benchmarks

Tracking allowed us to detect:

  • Coverage stagnation
  • Regression in test robustness
  • Overconfidence without real improvement

Most importantly, it allowed leadership to see quality evolution as a business metric not just a technical one.


Phase 4 β€” Measurable Impact: Demonstrating Structural Improvement

Impact must be observable over time.

We measured a full quarter (Q), comparing:

  • Initial baseline (A)
  • End-of-quarter state (Z) Across both Front-End and Back-End.

The improvement was calculated as the aggregated delta between the start and end of the quarter across both layers.

This allowed us to evaluate structural evolution not isolated spikes.

Front-End β€” Mutation Testing (Stryker) and Back-End β€” Mutation Testing (PIT)

During Phase 4, we measured mutation score evolution across the quarter.
We evaluate test strength at bytecode level.
The goal was to measure resilience against artificial faults, ensuring real defect detection capacity.

Results

Across more than 25 production teams, the transformation generated measurable structural improvements:

  • Significant average improvement in overall code health across both Front-End and Back-End layers
  • Strong individual team uplifts, with multiple teams tripling their mutation resistance baseline
  • Near total defect detection before production release
  • Neartotal containment of production incidents
  • Noticeable uplift in both greenfield and brownfield environments
  • Stabilization of previously closed or legacy heavy projects
  • Substantial improvement in legacy dominated systems where test robustness was historically weak

Exact values have been intentionally abstracted to preserve confidentiality. The trends and proportional improvements reflect real production environments.

More importantly:

Delivery velocity was not negatively impacted.

In several cases, it improved due to reduced rework and fewer production incidents.


Final Reflection

Quality is not an optional enhancement.

It directly impacts:

  • Credibility
  • Reliability
  • Scalability
  • Stakeholder trust
  • Financial performance

Whether you are Apple, Amazon, or a small digital startup, quality defines how your product is perceived.
And quality is not owned by QA alone.It is a shared responsibility across the SDLC.
The role of leadership is not to enforce quality once but to design systems where quality becomes the default behavior.

Top comments (3)

Collapse
 
theminimalcreator profile image
Guilherme Zaia

Phase 1 is solid, but here's the gap: mutation testing alone doesn't prove architectural robustness.

Example: 90% mutation score on a service riddled with temporal coupling or tight DB dependencies? You're testing implementation, not behavior isolation.

Real question: Did you enforce Contract Testing between those 25 teams? Because DDR drops to zero when Team A's "passing tests" break Team B's integration silently.

Mutation coverage measures test strength. Architecture resilience requires boundary contracts (Pact/Spring Cloud Contract). Without them, you optimized the wrong layer.

Collapse
 
alejandrosierraariasdev profile image
Alejandro Sierra

The evolution across the 25 teams was not limited to unit testing alone that was just the first layer of a broader transformation.

At the beginning, there were no unit tests at all, so we intentionally started at the base of the test pyramid. Unit tests are the least costly, fastest to execute, and easiest to integrate into a CI pipeline. They also introduce the lowest coordination overhead across teams. The goal in Phase 1 was to establish fast feedback and enforce method level correctness.

In Phase 2, we moved upward in the pyramid and focused on integration.
β€’ Component tests were implemented using Testcontainers for backend services.
β€’ Frontend components were validated using Testing Library.
β€’ Cross-product integration tests were introduced to validate interactions between services.

So the strategy was sequential but layered:

  1. First, guarantee method-level correctness and behavior isolation through unit tests.
  2. Then ensure service and cross-service integration stability.
  3. In parallel, we added domain-level and lower-level logic validations (not just interface checks), including specific focused tests to validate internal business rules beyond API contracts.

Mutation testing was used to strengthen unit tests, not to claim architectural resilience. Architectural robustness was addressed in the integration phase through containerized component tests and product-level integration validation.

In short: we didn’t optimize the wrong layer we built the pyramid bottom-up, then reinforced the boundaries.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.