Introduction
This week, I embarked on a comprehensive refactoring journey for my Repository Context Packager project. The goal wasn't to add new features or fix bugs, but rather to improve the code's structure, readability, maintainability, and squashing all my commits to just one, paying down technical debt that had accumulated during rapid development.
What I Focused On
My refactoring efforts centered on six key areas, all guided by software engineering principles like DRY (Don't Repeat Yourself), Single Responsibility Principle (SRP), and SOLID principles:
1. Reducing Code Duplication
I found the same token calculation formula (Math.round(content.length / 4)
) repeated in two different places. This is a classic DRY violation—if the formula needed updating, I'd have to remember to change it in multiple locations.
2. Improving Function Modularity
The command-line option parsing logic was something else, lines of repetitive if-statements, all doing basically the same thing with slight variations. I changed that as well, thanks to Cursor for this.
3. Breaking Down Large Functions
The main action callback function was a monster at 110 lines, handling validation, parsing, execution, and output all in one place. This violated the Single Responsibility Principle spectacularly.
How I Fixed Each Issue
Refactoring #1: Extract Default Ignore Patterns
Before:
export async function collectFiles(...) {
const defaultIgnore = [
'node_modules/**',
'.git/**',
'*.log',
];
}
After:
const DEFAULT_IGNORE_PATTERNS = [
'node_modules/**',
'.git/**',
'*.log',
];
export async function collectFiles(...) {}
Impact: This made the ignore patterns easy to find, document, and modify without touching function logic.
Refactoring #2: Extract Token Calculation Function
Before:
const fileTokens = Math.round(content.length / 4);
return files.reduce((total, file) => total + Math.round(file.content.length / 4), 0);
After:
export function calculateTokens(content: string): number {
return Math.round(content.length / 4);
}
const fileTokens = calculateTokens(content);
Impact: Single source of truth for token calculation. If the formula changes, update it once.
Refactoring #3: Extract Option Parsing Logic
Before:
.action(async (paths, options) => {
const packagerOptions = {};
if (options.include) {
packagerOptions.include = options.include.split(',').map((p: string) => p.trim());
}
if (options.exclude) {
packagerOptions.exclude = options.exclude.split(',').map((p: string) => p.trim());
}
});
After:
function parseCommandLineOptions(options: any): PackagerOptions {
const packagerOptions: PackagerOptions = {};
if (options.include) {
packagerOptions.include = options.include.split(',').map((p: string) => p.trim());
}
return packagerOptions;
}
.action(async (paths, options) => {
const packagerOptions = parseCommandLineOptions(options);
});
Impact: The action callback became readable. Option parsing is now testable in isolation.
Refactoring #4: Split Large Action Function
Before:
.action(async (paths, options) => {
// 110 lines of:
// - input validation
// - option parsing
// - output validation
// - execution
// - result display
// - error handling
});
After:
function validateInputPaths(paths: string[]): void { /* ... */ }
function validateOutputPath(outputFile: string): void { /* ... */ }
async function executePackaging(...): Promise<void> { /* ... */ }
function displayResults(...): void { /* ... */ }
.action(async (paths, options) => {
validateInputPaths(paths);
const packagerOptions = parseCommandLineOptions(options);
const outputFile = options.output || 'output.md';
validateOutputPath(outputFile);
try {
await executePackaging(paths, packagerOptions, outputFile);
displayResults(outputFile, !!options.output);
} catch (error) {
console.error(chalk.red(`❌ Error: ${(error as Error).message}`));
process.exit(1);
}
});
Impact:
- Action callback reduced from 110 lines to 24 lines
- Each function has one clear purpose
- Much easier to test and debug
- Other parts of the codebase can reuse these functions
Refactoring #5: Extract Statistics Collection Class
Before:
// Inside the 167-line analyzeRepository() method:
let currentTokens = 0;
const fileTypes: Record<string, number> = {};
let largestFile: { path: string; lines: number } | null = null;
let totalCharacters = 0;
const processedDirectories = new Set<string>();
// 50 lines of statistics tracking mixed with file processing
After:
// New statistics.ts file
export class RepositoryStatistics {
private totalCharacters: number = 0;
private fileTypes: Record<string, number> = {};
private largestFile: { path: string; lines: number } | null = null;
public trackFile(fileInfo: FileInfo): void { /* ... */ }
public wouldExceedTokenLimit(content: string, maxTokens?: number): boolean { /* ... */ }
public getTotalCharacters(): number { /* ... */ }
}
// In packager.ts:
const stats = new RepositoryStatistics();
stats.trackFile(fileInfo);
Impact:
- Statistics logic is now reusable and testable
-
analyzeRepository()
became much more readable - Clear separation of concerns
Refactoring #6: Create Error Handling Utilities
Before:
// Scattered throughout the code:
process.stderr.write(`Error: File '${filePath}' not found\n`);
process.stderr.write(`Warning: File '${filePath}' no longer exists, skipping\n`);
process.stderr.write(`Skipping ${filePath}: file too large\n`);
After:
// New logger.ts file
export function logError(message: string): void {
process.stderr.write(`Error: ${message}\n`);
}
export function logWarning(message: string): void {
process.stderr.write(`Warning: ${message}\n`);
}
export function logFileError(filePath: string, error: any): void {
if (error.code === 'ENOENT') {
logError(`File '${filePath}' not found`);
} else if (error.code === 'EACCES') {
logError(`Permission denied reading '${filePath}'`);
}
// ... smart error handling
}
// In packager.ts:
logError(`Path '${singlePath}' does not exist`);
logWarning(`File '${filePath}' no longer exists, skipping`);
logFileError(filePath, error);
Impact:
- Consistent error message formatting
- Centralized logging makes it easy to add features like log levels or file logging
- Semantic function names make code self-documenting
The Git Rebase Experience
This is where things got interesting! The instructions called for an interactive rebase to squash all commits into one. Here's what happened:
Initial Attempt: Branch-Based Refactoring Workflow
Instead of performing everything directly on the main
branch, I decided to use a safer and cleaner approach by creating a separate refactoring branch.
Step 1: Update main
I started by ensuring my local main
branch was up to date:
git checkout main
git pull origin main
Step 2: Create a Refactoring Branch
Next, I created a new branch for all my refactoring work:
git checkout -b refactoring
Step 3: Make and Commit Changes
I made several improvements across the codebase, committing each logical change separately:
git add .
git commit -m "Refactored helper functions for clarity"
git commit -m "Simplified data handling logic"
git commit -m "Improved maintainability and readability"
Step 4: Squash Commits
After testing and reviewing, I decided to squash the multiple commits into one for a cleaner commit history:
git reset --soft HEAD~6
git commit -m "Refactoring codebase to improve maintainability and code quality"
The --soft
flag preserved all staged changes while removing the last 6 commits, allowing me to create one detailed and consolidated commit.
Step 5: Merge Back to main
Once everything looked good, I merged the refactoring branch back into main
using a fast-forward merge:
git checkout main
git merge --ff-only refactoring # <-- This ensures a clean, linear history
Step 6: Push to GitHub
Finally, I pushed my updated main
branch to GitHub:
git push origin main
This worked perfectly!
This branch-based approach kept my main
branch stable, avoided interactive rebase editor issues, and resulted in a clear and professional commit history.
Did I Find Any Bugs?
Surprisingly, no bugs were found during refactoring! However, the refactoring process did reveal several code smells and potential issues:
Hidden Complexity
Testing Difficulty
Maintenance Risks
Did I Break Anything?
Test after each commit: After every refactoring step, I ran:
npm run build
This ensured TypeScript compilation succeeded and no syntax errors were introduced.
What Went Well:
- Incremental commits: Making 6 small commits made the refactoring process manageable
- Clear commit messages: Each commit message explained exactly what changed
- Soft reset technique: When interactive rebase failed, the soft reset approach was actually simpler and more intuitive
Conclusion
The codebase is now significantly more maintainable, and I'm confident that future changes will be easier to implement. When we write automated tests later, having modular, focused functions will make that process much smoother.
Most importantly, this exercise taught me that refactoring isn't just about cleaning code—it's about making your future self's life easier.
Top comments (0)