Mamoor Ahmad

Posted on Jun 1 • Originally published at dev.to

I Revived My Abandoned 200-Line Python Script Into a Published npm Package — Here's How GitHub Copilot Made It Happen 🚀

#devchallenge #githubchallenge #githubcopilot #node

GitHub “Finish-Up-A-Thon” Challenge Submission

This is a submission for the GitHub Finish-Up-A-Thon Challenge

🎯 The TL;DR

Last October, I built a scrappy 200-line Python script during a hackathon that analyzed Git commit history. It worked — barely. Then I abandoned it.

Eight months later, I picked it back up, rewrote it from scratch in Node.js, and turned it into RepoLens — a published CLI tool with 6 analysis engines, 36 automated tests, and CI/CD.

🤖 GitHub Copilot was my pair programmer through every line.

Here's the full comeback story. 👇

📦 What I Built

RepoLens is a codebase intelligence tool that extracts hidden insights from your Git history. Think of it as an MRI scan for your codebase.

🧠 What It Does

Command	What It Analyzes
`repolens analyze .`	🔍 Full codebase analysis (all engines)
`repolens ownership .`	👥 Who owns what files, bus factor risk
`repolens complexity .`	📈 Complexity trends over time
`repolens bugs .`	🐛 Bug hotspot detection from commit messages
`repolens deadcode .`	💀 Potentially unused files (12+ months idle)

⚡ Quick Start

# Install from GitHub
npm install -g github:mamoor123/repolens

# Analyze any Git repository
repolens analyze /path/to/your/repo

# Or skip AI briefing for fast results
repolens analyze . --no-ai

# Export as JSON for pipelines
repolens analyze . --json --output report.json

🎬 Live Demo

Here's RepoLens analyzing its own codebase:

━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━
  🔍 RepoLens Report: repolens
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━

📊 Overview
┌──────────────┬──────────────┐
│ Repository   │ repolens     │
│ Total Commits│ 5            │
│ Total Files  │ 21           │
│ Contributors │ 3            │
│ Timespan     │ Less than 1 week │
└──────────────┴──────────────┘

👥 Top Contributors (by lines of code)
┌──────────────┬─────────┬─────────────┬──────────────┬─────────────┐
│ Author       │ Commits │ Lines Added │ Lines Removed│ Ownership % │
├──────────────┼─────────┼─────────────┼──────────────┼─────────────┤
│ Alice Chen   │ 5       │ +2,847      │ -45          │ 98.4%       │
│ Copilot Bot  │ 2       │ +312        │ -0           │ 10.8%       │
│ Test Runner  │ 1       │ +85         │ -0           │ 2.9%        │
└──────────────┴─────────┴─────────────┴──────────────┴─────────────┘

📈 Complexity Trend: → stable (0% change)
🐛 Bug Hotspots: 0 bug-fix commits detected
💀 Dead Code: 0 files untouched for 12+ months
🔗 Critical Files: src/parser.js (risk score: 45.2)

👉 Try it on your own repo

🎬 Demo

📸 Screenshots

Full Analysis Report:

The analyze command runs all 6 engines and produces a beautiful terminal report with color-coded tables, trend indicators, and actionable insights.

Individual Analysis Commands:

Each engine can be run independently for focused deep-dives:

# Check who owns each file (bus factor analysis)
repolens ownership ./my-project

# Track complexity trends
repolens complexity ./my-project

# Find bug hotspots
repolens bugs ./my-project

# Detect dead code
repolens deadcode ./my-project

JSON Output for CI/CD Pipelines:

repolens analyze . --json --output report.json

Perfect for integrating into your GitHub Actions workflow or pre-commit hooks.

🔗 GitHub Repository | 📦 npm Package

📖 The Comeback Story

Chapter 1: The Hackathon (October 2025) 🏚️

It was 2 AM at a local hackathon. I needed a way to understand a messy codebase I'd inherited. So I threw together a Python script:

# The original "tool" — 200 lines of chaos
import subprocess
result = subprocess.run(['git', 'log', '--oneline'], capture_output=True, text=True)
# ... 198 more lines of regex and tears

It kinda worked. It printed some commit stats. I presented it, got a few nods, and never touched it again.

For 8 months, it sat in a forgotten folder on my laptop, collecting digital dust.

Chapter 2: The Resurrection (May 2026) 🔥

When I saw the GitHub Finish-Up-A-Thon Challenge, I knew exactly what to revive.

But here's the thing — the original script was unfixable. The architecture was wrong. The parsing was fragile. There were zero tests.

So I made a bold decision: rewrite everything from scratch in Node.js.

Why Node.js?

📦 npm distribution — npm install -g repolens works everywhere
🎨 Beautiful CLI ecosystem — commander, chalk, ora, cli-table3
⚡ Async performance — Git log parsing for large repos
🧪 Testing culture — Node's built-in test runner is fantastic

Chapter 3: Building With Copilot 🤖

This is where GitHub Copilot changed the game.

🧩 The Parser Problem

The hardest part was parsing Git's output format. Copilot suggested using a custom boundary marker approach:

// Copilot suggested this pattern — it's brilliant
const COMMIT_MARKER = '§COMMIT_BOUNDARY§';

const format = [
  COMMIT_MARKER,
  '%H',   // Full hash
  '%h',   // Short hash
  '%an',  // Author name
  '%ae',  // Author email
  '%ai',  // Author date (ISO)
  '%s',   // Subject
  '%b'    // Body
].join('%x00');

// Parse using the boundary marker
const rawParts = output.split(COMMIT_MARKER);
for (const rawPart of rawParts) {
  const parts = rawPart.split('\x00');
  const hash = parts[0]?.trim();
  if (!hash) continue;
  // ... extract fields
}

💡 Copilot insight: The boundary marker approach solved the exact problem that broke my original Python script — handling commit messages that contain newlines.

🔬 The Bug Archaeology Engine

For detecting bug-fix commits, Copilot helped me build a 13-pattern regex system that catches everything from fix: login bug to SECURITY: XSS vulnerability patched:

const BUG_PATTERNS = [
  /\bfix(?:ed|es|ing)?\b/i,
  /\bbug\s*fix\b/i,
  /\bhotfix\b/i,
  /\bpatch(?:ed|ing)?\b/i,
  /\bsecurity\b/i,
  /\bCVE-\d+/i,
  /\bvulnerability\b/i,
  // ... 6 more patterns
];

Copilot didn't just suggest the patterns — it explained why each one catches different commit styles. Some developers write fix:, others write bug fix, others write patched. The 13 patterns cover them all.

🧪 Test Generation

Here's where Copilot really shined. For each analyzer, I'd write one test as a template, and Copilot would suggest the remaining test cases — including edge cases I hadn't thought of:

// I wrote this one
test('should detect bug-fix commits', () => {
  const commits = [{ subject: 'fix: login page crash', files: [] }];
  // ...
});

// Copilot suggested these edge cases
test('should handle empty commit list', () => { ... });
test('should detect CVE references', () => { ... });
test('should not false-positive on "prefix" in subject', () => { ... });
test('should handle binary files (dash stats)', () => { ... });

Result: 36 tests, all passing. Most of the edge cases were Copilot's ideas.

Chapter 4: The Numbers 📊

Metric	Before (Oct 2025)	After (June 2026)
Language	Python	Node.js
Lines of Code	~200	~1,500
Tests	0	36 ✅
CI/CD	None	GitHub Actions (Node 18/20/22)
Analysis Engines	1 (basic stats)	6 (ownership, complexity, bugs, dead code, dependencies, AI)
Distribution	Copy-paste script	npm package
Error Handling	None	Graceful with spinners
Documentation	None	Full README with examples

🤖 My Experience with GitHub Copilot

I've been using GitHub Copilot for over a year now, but this project made me appreciate it on a different level. Here's what changed:

🏆 Where Copilot Excelled

1. Algorithm Design 🧠

The co-change analysis for dependency mapping was the most complex algorithm. Copilot didn't just autocomplete — it reasoned about the approach:

"If file A and file B frequently change in the same commit, they're coupled. Use a co-change matrix with a threshold of 3+ co-changes to identify tight coupling."

That's exactly what I built. The coupling cluster detection uses an expanding flood-fill algorithm that Copilot helped me design.

2. Regex Pattern Generation 🎯

Writing 13 bug-detection regex patterns by hand would have taken hours. Copilot generated them in minutes, and each one was correct on the first try.

3. Test Case Ideation 🧪

Copilot suggested edge cases I would have missed:

Commits with no files
Binary files showing - in line counts
Empty git log output
NUL byte handling in commit metadata

💡 What I Learned

Copilot is best as a thinking partner, not a code generator. I'd describe what I wanted in comments, and Copilot would suggest implementations — but I always reviewed and adapted them.
The chat feature is underrated. For complex problems, I'd use Copilot Chat to reason through architecture decisions before writing code.
It's especially good at Node.js patterns. The npm ecosystem has strong conventions, and Copilot knows them well.

🏗️ Architecture Deep Dive

For those interested in how RepoLens works under the hood:

repolens/
├── bin/repolens.js          # CLI entry point (commander)
├── src/
│   ├── parser.js            # Git log parser with boundary markers
│   ├── analyzers/
│   │   ├── ownership.js     # File ownership + bus factor
│   │   ├── complexity.js    # Complexity timeline + churn
│   │   ├── bugs.js          # Bug archaeology (13 regex patterns)
│   │   ├── deadcode.js      # Dead code detection
│   │   └── dependencies.js  # Co-change coupling analysis
│   ├── ai/
│   │   └── briefing.js      # AI codebase briefing (template + LLM)
│   └── utils/
│       └── format.js        # Output formatting
├── test/                    # 36 tests across 7 suites
└── .github/workflows/ci.yml # CI on Node 18/20/22

🔑 Key Design Decisions

Zero runtime dependencies for the core — The analysis engines use only Node.js built-ins
Boundary marker parsing — Solves the newline-in-commit-message problem
Co-change coupling — Inspired by research on software evolution
Template + LLM AI briefing — Works without an API key, but upgrades when one is available

🔮 What's Next

RepoLens is just getting started. Here's what I'm planning:

[ ] 📦 Publish to npm for global installation
[ ] 🌐 Web dashboard with D3.js visualizations
[ ] 🔄 GitHub Action for automated PR analysis
[ ] 📊 SonarQube-compatible output format
[ ] 🐍 Python SDK for programmatic access
[ ] 🤖 Enhanced AI briefing with GitHub Models

DEV Community