myougaTheAxo

Posted on Mar 11

Automated Technical Debt Detection with Claude Code /refactor-suggest

#claudecode #refactoring #codequality #ai

Technical debt accumulates silently. A function that started at 20 lines grows to 120. A variable named x makes sense to the original author at 2am but confuses everyone else six months later. Duplicate logic spreads across three files because nobody had time to refactor.

The /refactor-suggest skill for Claude Code brings a structured, automated approach to detecting these problems before they become architectural crises.

What Is /refactor-suggest?

/refactor-suggest is a Claude Code custom skill that performs static analysis across your codebase and surfaces refactoring opportunities ranked by severity. Unlike a linter that checks for style violations, this skill reasons about code structure — identifying patterns that are technically correct but will become maintenance liabilities.

Run it against any directory:

/refactor-suggest src/

Or target a specific file:

/refactor-suggest src/api/user_service.py

The output is a prioritized list of findings, each with a severity rating, location, explanation, and a concrete suggestion.

The 4 Detection Axes

The skill evaluates code across four dimensions. Each maps to a distinct class of technical debt.

Axis 1: Cyclomatic Complexity

Cyclomatic complexity measures the number of independent paths through a function. A function with no branches has complexity 1. Each if, for, while, case, or catch adds 1.

The thresholds used:

Score	Label	Action
1–5	Low	No action needed
6–10	Medium	Consider splitting
11–20	High	Refactor recommended
21+	Critical	Refactor required

High complexity correlates strongly with bug density. Studies by McCabe (1976) and subsequent research consistently show that functions with complexity above 10 have significantly higher defect rates.

Axis 2: Code Duplication

The skill detects semantic duplication — not just copy-paste, but functionally equivalent logic expressed differently. This includes:

Identical code blocks with different variable names
Near-duplicate functions that differ only in a parameter or output type
Repeated conditional chains that could be replaced with a lookup table or strategy pattern

Axis 3: Naming Quality

Poor naming is invisible technical debt. The skill flags:

Single-character variables outside of loop counters (i, j in for loops are acceptable)
Abbreviations that aren't industry-standard (usr, mgr, proc)
Generic names that carry no domain information (data, info, result, temp)
Boolean variables without the is_, has_, can_ prefix convention
Functions whose name doesn't match their behavior (detected by comparing function name to its actual operations)

Axis 4: Dead Code

Code that exists but is never executed creates confusion about intent and bloats the codebase. The skill identifies:

Unreachable statements after return/throw
Functions that are defined but never called within the analyzed scope
Commented-out code blocks (treated as a soft warning — sometimes intentional)
Feature flags that are hardcoded to False/disabled

Example Output

Running against a sample Python service:

/refactor-suggest src/services/

[CRITICAL] src/services/order_processor.py:45 — process_order()
  Axis       : Cyclomatic Complexity
  Score      : 24 (threshold: 10)
  Summary    : This function handles 8 distinct payment methods, 3 inventory states,
               and 4 shipping scenarios in a single 180-line block. Any change to
               payment logic requires understanding all shipping logic.
  Suggestion : Extract payment validation to process_payment(), inventory check to
               check_inventory(), and shipping logic to calculate_shipping(). Each
               should be independently testable.

[HIGH] src/services/user_service.py:112 & src/services/admin_service.py:67
  Axis       : Code Duplication
  Similarity : 94%
  Summary    : validate_email_format() and admin_validate_email() are functionally
               identical. Both use the same regex, the same error messages, and the
               same return structure.
  Suggestion : Extract to a shared utils/validation.py module. Both callers import
               from the same source.

[HIGH] src/services/report_generator.py:23
  Axis       : Naming Quality
  Variable   : d (line 23), res (line 31), tmp (line 45)
  Summary    : Three variables in generate_monthly_report() use abbreviations that
               require context to decode. In a 90-line function, readers must track
               what each abbreviation represents.
  Suggestion : Rename d → report_date, res → aggregated_results, tmp → row_buffer

[MEDIUM] src/services/notification_service.py:89
  Axis       : Dead Code
  Finding    : send_sms_notification() is defined but has 0 call sites in the
               analyzed scope. Last modified 14 months ago.
  Suggestion : If SMS notifications are no longer needed, remove this function and
               its dependencies (twilio import, SMS_API_KEY config). If planned for
               future use, add a TODO comment with the relevant ticket number.

---
Summary: 1 CRITICAL, 2 HIGH, 1 MEDIUM, 0 LOW
Estimated refactor effort: ~4 hours
Files analyzed: 12 | Functions analyzed: 87

Before/After Refactoring Examples

Example 1: Reducing Cyclomatic Complexity

Before (complexity: 24):

def process_order(order, user, inventory, shipping_config):
    if order.payment_method == "credit_card":
        if order.amount > 1000:
            if user.is_verified:
                # ... 15 lines of credit card logic
            else:
                raise UnverifiedUserError()
        else:
            # ... 10 lines
    elif order.payment_method == "paypal":
        if inventory.is_available(order.items):
            # ... 20 lines of PayPal logic
        # ... continues for 6 more payment methods

After (complexity per function: 3–5):

def process_order(order, user, inventory, shipping_config):
    payment_result = process_payment(order, user)
    inventory_result = check_inventory(order.items, inventory)
    shipping_result = calculate_shipping(order, shipping_config)
    return OrderResult(payment_result, inventory_result, shipping_result)

def process_payment(order, user):
    handler = PAYMENT_HANDLERS.get(order.payment_method)
    if not handler:
        raise UnsupportedPaymentMethod(order.payment_method)
    return handler(order, user)

PAYMENT_HANDLERS = {
    "credit_card": _handle_credit_card,
    "paypal": _handle_paypal,
    # ...
}

Each function now has a single responsibility and can be tested in isolation.

Example 2: Eliminating Duplication

Before (duplicated across 3 files):

# user_service.py
def validate_email_format(email):
    pattern = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(pattern, email))

# admin_service.py
def admin_validate_email(email_address):
    regex = r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$'
    return bool(re.match(regex, email_address))

After (single source of truth):

# utils/validation.py
EMAIL_PATTERN = re.compile(r'^[a-zA-Z0-9._%+-]+@[a-zA-Z0-9.-]+\.[a-zA-Z]{2,}$')

def is_valid_email(email: str) -> bool:
    return bool(EMAIL_PATTERN.match(email))

When the email validation rule needs to change (and it will), you change it in one place.

Integration with CI/CD (GitHub Actions)

Running /refactor-suggest in your CI pipeline enforces a complexity ceiling across the codebase. New code that introduces CRITICAL complexity cannot be merged without author acknowledgment.

# .github/workflows/code-quality.yml
name: Code Quality Gate

on: [pull_request]

jobs:
  refactor-check:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4

      - name: Install Claude Code
        run: npm install -g @anthropic-ai/claude-code

      - name: Run /refactor-suggest on changed files
        env:
          ANTHROPIC_API_KEY: ${{ secrets.ANTHROPIC_API_KEY }}
        run: |
          # Get files changed in this PR
          CHANGED=$(git diff --name-only origin/${{ github.base_ref }}...HEAD                     | grep -E '\.(py|ts|js|kt|java)$' | tr '\n' ' ')

          if [ -n "$CHANGED" ]; then
            claude /refactor-suggest $CHANGED --fail-on-critical
          fi

      - name: Comment results on PR
        if: failure()
        uses: actions/github-script@v7
        with:
          script: |
            github.rest.issues.createComment({
              issue_number: context.issue.number,
              owner: context.repo.owner,
              repo: context.repo.repo,
              body: '`/refactor-suggest` detected CRITICAL complexity issues. See job logs for details.'
            })

This gates on CRITICAL findings only — medium and low severity appear in logs but don't block the merge. Teams can adjust the threshold based on their tolerance.

Why This Matters More Than Style Linting

Most teams already run linters. ESLint, Pylint, ktlint — they catch style violations and obvious errors. But they don't catch complexity creep, semantic duplication, or naming that degrades over time.

/refactor-suggest fills the gap between "code that passes CI" and "code that a senior engineer would approve." It surfaces the kind of feedback you'd get in a code review from someone who has seen the long-term consequences of letting complexity grow unchecked.

The ROI is asymmetric: five minutes to run the scan now, versus hours or days debugging an entangled function six months from now.

Get the Code Review Pack

/refactor-suggest is included in the Code Review Pack (¥980), available on PromptWorks.

The pack includes:

/refactor-suggest — Technical debt detection across 4 axes
/review-checklist — Pre-PR checklist generator for your codebase conventions
/complexity-report — Standalone cyclomatic complexity report with trend tracking

→ Get Code Review Pack on PromptWorks

Built by myouga (@myougatheaxo) — Claude Code skills for engineers who care about maintainability.