Tajudeen Abdulgafar

Posted on Oct 11

Refactoring Journey: Improving Code Quality

#architecture #git #softwareengineering

Introduction

This week, I embarked on a comprehensive refactoring journey for my Repository Context Packager project. The goal wasn't to add new features or fix bugs, but rather to improve the code's structure, readability, maintainability, and squashing all my commits to just one, paying down technical debt that had accumulated during rapid development.

What I Focused On

My refactoring efforts centered on six key areas, all guided by software engineering principles like DRY (Don't Repeat Yourself), Single Responsibility Principle (SRP), and SOLID principles:

1. Reducing Code Duplication

I found the same token calculation formula (Math.round(content.length / 4)) repeated in two different places. This is a classic DRY violation—if the formula needed updating, I'd have to remember to change it in multiple locations.

2. Improving Function Modularity

The command-line option parsing logic was something else, lines of repetitive if-statements, all doing basically the same thing with slight variations. I changed that as well, thanks to Cursor for this.

3. Breaking Down Large Functions

The main action callback function was a monster at 110 lines, handling validation, parsing, execution, and output all in one place. This violated the Single Responsibility Principle spectacularly.

How I Fixed Each Issue

Refactoring #1: Extract Default Ignore Patterns

Before:

export async function collectFiles(...) {
  const defaultIgnore = [
    'node_modules/**', 
    '.git/**', 
    '*.log',
  ];
}

After:

const DEFAULT_IGNORE_PATTERNS = [
  'node_modules/**', 
  '.git/**', 
  '*.log',
];

export async function collectFiles(...) {}

Impact: This made the ignore patterns easy to find, document, and modify without touching function logic.

Refactoring #2: Extract Token Calculation Function

Before:

const fileTokens = Math.round(content.length / 4);
return files.reduce((total, file) => total + Math.round(file.content.length / 4), 0);

After:

export function calculateTokens(content: string): number {
  return Math.round(content.length / 4);
}
const fileTokens = calculateTokens(content);

Impact: Single source of truth for token calculation. If the formula changes, update it once.

Refactoring #3: Extract Option Parsing Logic

Before:

.action(async (paths, options) => {
  const packagerOptions = {};

  if (options.include) {
    packagerOptions.include = options.include.split(',').map((p: string) => p.trim());
  }
  if (options.exclude) {
    packagerOptions.exclude = options.exclude.split(',').map((p: string) => p.trim());
  }
});

After:

function parseCommandLineOptions(options: any): PackagerOptions {
  const packagerOptions: PackagerOptions = {};

  if (options.include) {
    packagerOptions.include = options.include.split(',').map((p: string) => p.trim());
  }

  return packagerOptions;
}

.action(async (paths, options) => {
  const packagerOptions = parseCommandLineOptions(options);
});

Impact: The action callback became readable. Option parsing is now testable in isolation.

Refactoring #4: Split Large Action Function

Before:

.action(async (paths, options) => {
  // 110 lines of:
  // - input validation
  // - option parsing
  // - output validation
  // - execution
  // - result display
  // - error handling
});

After:

function validateInputPaths(paths: string[]): void { /* ... */ }
function validateOutputPath(outputFile: string): void { /* ... */ }
async function executePackaging(...): Promise<void> { /* ... */ }
function displayResults(...): void { /* ... */ }

.action(async (paths, options) => {
  validateInputPaths(paths);
  const packagerOptions = parseCommandLineOptions(options);
  const outputFile = options.output || 'output.md';
  validateOutputPath(outputFile);

  try {
    await executePackaging(paths, packagerOptions, outputFile);
    displayResults(outputFile, !!options.output);
  } catch (error) {
    console.error(chalk.red(`❌ Error: ${(error as Error).message}`));
    process.exit(1);
  }
});

Impact:

Action callback reduced from 110 lines to 24 lines
Each function has one clear purpose
Much easier to test and debug
Other parts of the codebase can reuse these functions

Refactoring #5: Extract Statistics Collection Class

Before:

// Inside the 167-line analyzeRepository() method:
let currentTokens = 0;
const fileTypes: Record<string, number> = {};
let largestFile: { path: string; lines: number } | null = null;
let totalCharacters = 0;
const processedDirectories = new Set<string>();

// 50 lines of statistics tracking mixed with file processing

After:

// New statistics.ts file
export class RepositoryStatistics {
  private totalCharacters: number = 0;
  private fileTypes: Record<string, number> = {};
  private largestFile: { path: string; lines: number } | null = null;

  public trackFile(fileInfo: FileInfo): void { /* ... */ }
  public wouldExceedTokenLimit(content: string, maxTokens?: number): boolean { /* ... */ }
  public getTotalCharacters(): number { /* ... */ }
}

// In packager.ts:
const stats = new RepositoryStatistics();
stats.trackFile(fileInfo);

Impact:

Statistics logic is now reusable and testable
analyzeRepository() became much more readable
Clear separation of concerns

Refactoring #6: Create Error Handling Utilities

Before:

// Scattered throughout the code:
process.stderr.write(`Error: File '${filePath}' not found\n`);
process.stderr.write(`Warning: File '${filePath}' no longer exists, skipping\n`);
process.stderr.write(`Skipping ${filePath}: file too large\n`);

After:

// New logger.ts file
export function logError(message: string): void {
  process.stderr.write(`Error: ${message}\n`);
}

export function logWarning(message: string): void {
  process.stderr.write(`Warning: ${message}\n`);
}

export function logFileError(filePath: string, error: any): void {
  if (error.code === 'ENOENT') {
    logError(`File '${filePath}' not found`);
  } else if (error.code === 'EACCES') {
    logError(`Permission denied reading '${filePath}'`);
  }
  // ... smart error handling
}

// In packager.ts:
logError(`Path '${singlePath}' does not exist`);
logWarning(`File '${filePath}' no longer exists, skipping`);
logFileError(filePath, error);

Impact:

Consistent error message formatting
Centralized logging makes it easy to add features like log levels or file logging
Semantic function names make code self-documenting

The Git Rebase Experience

This is where things got interesting! The instructions called for an interactive rebase to squash all commits into one. Here's what happened:

Initial Attempt: Branch-Based Refactoring Workflow

Instead of performing everything directly on the main branch, I decided to use a safer and cleaner approach by creating a separate refactoring branch.

Step 1: Update `main`

I started by ensuring my local main branch was up to date:

git checkout main
git pull origin main

Step 2: Create a Refactoring Branch

Next, I created a new branch for all my refactoring work:

git checkout -b refactoring

Step 3: Make and Commit Changes

I made several improvements across the codebase, committing each logical change separately:

git add .
git commit -m "Refactored helper functions for clarity"
git commit -m "Simplified data handling logic"
git commit -m "Improved maintainability and readability"

Step 4: Squash Commits

After testing and reviewing, I decided to squash the multiple commits into one for a cleaner commit history:

git reset --soft HEAD~6
git commit -m "Refactoring codebase to improve maintainability and code quality"

The --soft flag preserved all staged changes while removing the last 6 commits, allowing me to create one detailed and consolidated commit.

Step 5: Merge Back to `main`

Once everything looked good, I merged the refactoring branch back into main using a fast-forward merge:

git checkout main
git merge --ff-only refactoring  # <-- This ensures a clean, linear history

Step 6: Push to GitHub

Finally, I pushed my updated main branch to GitHub:

git push origin main

This worked perfectly!
This branch-based approach kept my main branch stable, avoided interactive rebase editor issues, and resulted in a clear and professional commit history.

Did I Find Any Bugs?

Surprisingly, no bugs were found during refactoring! However, the refactoring process did reveal several code smells and potential issues:

Hidden Complexity
Testing Difficulty
Maintenance Risks

Did I Break Anything?

Test after each commit: After every refactoring step, I ran:

   npm run build

This ensured TypeScript compilation succeeded and no syntax errors were introduced.

What Went Well:

Incremental commits: Making 6 small commits made the refactoring process manageable
Clear commit messages: Each commit message explained exactly what changed
Soft reset technique: When interactive rebase failed, the soft reset approach was actually simpler and more intuitive

Conclusion

The codebase is now significantly more maintainable, and I'm confident that future changes will be easier to implement. When we write automated tests later, having modular, focused functions will make that process much smoother.

Most importantly, this exercise taught me that refactoring isn't just about cleaning code—it's about making your future self's life easier.

DEV Community

Refactoring Journey: Improving Code Quality

Introduction

What I Focused On

1. Reducing Code Duplication

2. Improving Function Modularity

3. Breaking Down Large Functions

How I Fixed Each Issue

Refactoring #1: Extract Default Ignore Patterns

Refactoring #2: Extract Token Calculation Function

Refactoring #3: Extract Option Parsing Logic

Refactoring #4: Split Large Action Function

Refactoring #5: Extract Statistics Collection Class

Refactoring #6: Create Error Handling Utilities

The Git Rebase Experience

Initial Attempt: Branch-Based Refactoring Workflow

Step 1: Update `main`

Step 2: Create a Refactoring Branch

Step 3: Make and Commit Changes

Step 4: Squash Commits

Step 5: Merge Back to `main`

Step 6: Push to GitHub

Did I Find Any Bugs?

Did I Break Anything?

What Went Well:

Conclusion

Top comments (0)

Introduction

What I Focused On

1. Reducing Code Duplication

2. Improving Function Modularity

3. Breaking Down Large Functions

How I Fixed Each Issue

Refactoring #1: Extract Default Ignore Patterns

Refactoring #2: Extract Token Calculation Function

Refactoring #3: Extract Option Parsing Logic

Refactoring #4: Split Large Action Function

Refactoring #5: Extract Statistics Collection Class

Refactoring #6: Create Error Handling Utilities

The Git Rebase Experience

Initial Attempt: Branch-Based Refactoring Workflow

Step 1: Update main

Step 2: Create a Refactoring Branch

Step 3: Make and Commit Changes

Step 4: Squash Commits

Step 5: Merge Back to main

Step 6: Push to GitHub

Did I Find Any Bugs?

Did I Break Anything?

What Went Well:

Conclusion

Step 1: Update `main`

Step 5: Merge Back to `main`