This is a submission for the GitHub Finish-Up-A-Thon Challenge
๐ฏ The TL;DR
Last October, I built a scrappy 200-line Python script during a hackathon that analyzed Git commit history. It worked โ barely. Then I abandoned it.
Eight months later, I picked it back up, rewrote it from scratch in Node.js, and turned it into RepoLens โ a published CLI tool with 6 analysis engines, 36 automated tests, and CI/CD.
๐ค GitHub Copilot was my pair programmer through every line.
Here's the full comeback story. ๐
๐ฆ What I Built
RepoLens is a codebase intelligence tool that extracts hidden insights from your Git history. Think of it as an MRI scan for your codebase.
๐ง What It Does
| Command | What It Analyzes |
|---|---|
repolens analyze . |
๐ Full codebase analysis (all engines) |
repolens ownership . |
๐ฅ Who owns what files, bus factor risk |
repolens complexity . |
๐ Complexity trends over time |
repolens bugs . |
๐ Bug hotspot detection from commit messages |
repolens deadcode . |
๐ Potentially unused files (12+ months idle) |
โก Quick Start
# Install from GitHub
npm install -g github:mamoor123/repolens
# Analyze any Git repository
repolens analyze /path/to/your/repo
# Or skip AI briefing for fast results
repolens analyze . --no-ai
# Export as JSON for pipelines
repolens analyze . --json --output report.json
๐ฌ Live Demo
Here's RepoLens analyzing its own codebase:
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ RepoLens Report: repolens
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
๐ Overview
โโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโ
โ Repository โ repolens โ
โ Total Commitsโ 5 โ
โ Total Files โ 21 โ
โ Contributors โ 3 โ
โ Timespan โ Less than 1 week โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโ
๐ฅ Top Contributors (by lines of code)
โโโโโโโโโโโโโโโโฌโโโโโโโโโโฌโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโโฌโโโโโโโโโโโโโโ
โ Author โ Commits โ Lines Added โ Lines Removedโ Ownership % โ
โโโโโโโโโโโโโโโโผโโโโโโโโโโผโโโโโโโโโโโโโโผโโโโโโโโโโโโโโโผโโโโโโโโโโโโโโค
โ Alice Chen โ 5 โ +2,847 โ -45 โ 98.4% โ
โ Copilot Bot โ 2 โ +312 โ -0 โ 10.8% โ
โ Test Runner โ 1 โ +85 โ -0 โ 2.9% โ
โโโโโโโโโโโโโโโโดโโโโโโโโโโดโโโโโโโโโโโโโโดโโโโโโโโโโโโโโโดโโโโโโโโโโโโโโ
๐ Complexity Trend: โ stable (0% change)
๐ Bug Hotspots: 0 bug-fix commits detected
๐ Dead Code: 0 files untouched for 12+ months
๐ Critical Files: src/parser.js (risk score: 45.2)
๐ฌ Demo
๐ธ Screenshots
Full Analysis Report:
The analyze command runs all 6 engines and produces a beautiful terminal report with color-coded tables, trend indicators, and actionable insights.
Individual Analysis Commands:
Each engine can be run independently for focused deep-dives:
# Check who owns each file (bus factor analysis)
repolens ownership ./my-project
# Track complexity trends
repolens complexity ./my-project
# Find bug hotspots
repolens bugs ./my-project
# Detect dead code
repolens deadcode ./my-project
JSON Output for CI/CD Pipelines:
repolens analyze . --json --output report.json
Perfect for integrating into your GitHub Actions workflow or pre-commit hooks.
๐ GitHub Repository | ๐ฆ npm Package
๐ The Comeback Story
Chapter 1: The Hackathon (October 2025) ๐๏ธ
It was 2 AM at a local hackathon. I needed a way to understand a messy codebase I'd inherited. So I threw together a Python script:
# The original "tool" โ 200 lines of chaos
import subprocess
result = subprocess.run(['git', 'log', '--oneline'], capture_output=True, text=True)
# ... 198 more lines of regex and tears
It kinda worked. It printed some commit stats. I presented it, got a few nods, and never touched it again.
For 8 months, it sat in a forgotten folder on my laptop, collecting digital dust.
Chapter 2: The Resurrection (May 2026) ๐ฅ
When I saw the GitHub Finish-Up-A-Thon Challenge, I knew exactly what to revive.
But here's the thing โ the original script was unfixable. The architecture was wrong. The parsing was fragile. There were zero tests.
So I made a bold decision: rewrite everything from scratch in Node.js.
Why Node.js?
- ๐ฆ npm distribution โ
npm install -g repolensworks everywhere - ๐จ Beautiful CLI ecosystem โ commander, chalk, ora, cli-table3
- โก Async performance โ Git log parsing for large repos
- ๐งช Testing culture โ Node's built-in test runner is fantastic
Chapter 3: Building With Copilot ๐ค
This is where GitHub Copilot changed the game.
๐งฉ The Parser Problem
The hardest part was parsing Git's output format. Copilot suggested using a custom boundary marker approach:
// Copilot suggested this pattern โ it's brilliant
const COMMIT_MARKER = 'ยงCOMMIT_BOUNDARYยง';
const format = [
COMMIT_MARKER,
'%H', // Full hash
'%h', // Short hash
'%an', // Author name
'%ae', // Author email
'%ai', // Author date (ISO)
'%s', // Subject
'%b' // Body
].join('%x00');
// Parse using the boundary marker
const rawParts = output.split(COMMIT_MARKER);
for (const rawPart of rawParts) {
const parts = rawPart.split('\x00');
const hash = parts[0]?.trim();
if (!hash) continue;
// ... extract fields
}
๐ก Copilot insight: The boundary marker approach solved the exact problem that broke my original Python script โ handling commit messages that contain newlines.
๐ฌ The Bug Archaeology Engine
For detecting bug-fix commits, Copilot helped me build a 13-pattern regex system that catches everything from fix: login bug to SECURITY: XSS vulnerability patched:
const BUG_PATTERNS = [
/\bfix(?:ed|es|ing)?\b/i,
/\bbug\s*fix\b/i,
/\bhotfix\b/i,
/\bpatch(?:ed|ing)?\b/i,
/\bsecurity\b/i,
/\bCVE-\d+/i,
/\bvulnerability\b/i,
// ... 6 more patterns
];
Copilot didn't just suggest the patterns โ it explained why each one catches different commit styles. Some developers write fix:, others write bug fix, others write patched. The 13 patterns cover them all.
๐งช Test Generation
Here's where Copilot really shined. For each analyzer, I'd write one test as a template, and Copilot would suggest the remaining test cases โ including edge cases I hadn't thought of:
// I wrote this one
test('should detect bug-fix commits', () => {
const commits = [{ subject: 'fix: login page crash', files: [] }];
// ...
});
// Copilot suggested these edge cases
test('should handle empty commit list', () => { ... });
test('should detect CVE references', () => { ... });
test('should not false-positive on "prefix" in subject', () => { ... });
test('should handle binary files (dash stats)', () => { ... });
Result: 36 tests, all passing. Most of the edge cases were Copilot's ideas.
Chapter 4: The Numbers ๐
| Metric | Before (Oct 2025) | After (June 2026) |
|---|---|---|
| Language | Python | Node.js |
| Lines of Code | ~200 | ~1,500 |
| Tests | 0 | 36 โ |
| CI/CD | None | GitHub Actions (Node 18/20/22) |
| Analysis Engines | 1 (basic stats) | 6 (ownership, complexity, bugs, dead code, dependencies, AI) |
| Distribution | Copy-paste script | npm package |
| Error Handling | None | Graceful with spinners |
| Documentation | None | Full README with examples |
๐ค My Experience with GitHub Copilot
I've been using GitHub Copilot for over a year now, but this project made me appreciate it on a different level. Here's what changed:
๐ Where Copilot Excelled
1. Algorithm Design ๐ง
The co-change analysis for dependency mapping was the most complex algorithm. Copilot didn't just autocomplete โ it reasoned about the approach:
"If file A and file B frequently change in the same commit, they're coupled. Use a co-change matrix with a threshold of 3+ co-changes to identify tight coupling."
That's exactly what I built. The coupling cluster detection uses an expanding flood-fill algorithm that Copilot helped me design.
2. Regex Pattern Generation ๐ฏ
Writing 13 bug-detection regex patterns by hand would have taken hours. Copilot generated them in minutes, and each one was correct on the first try.
3. Test Case Ideation ๐งช
Copilot suggested edge cases I would have missed:
- Commits with no files
- Binary files showing
-in line counts - Empty git log output
- NUL byte handling in commit metadata
๐ก What I Learned
- Copilot is best as a thinking partner, not a code generator. I'd describe what I wanted in comments, and Copilot would suggest implementations โ but I always reviewed and adapted them.
- The chat feature is underrated. For complex problems, I'd use Copilot Chat to reason through architecture decisions before writing code.
- It's especially good at Node.js patterns. The npm ecosystem has strong conventions, and Copilot knows them well.
๐๏ธ Architecture Deep Dive
For those interested in how RepoLens works under the hood:
repolens/
โโโ bin/repolens.js # CLI entry point (commander)
โโโ src/
โ โโโ parser.js # Git log parser with boundary markers
โ โโโ analyzers/
โ โ โโโ ownership.js # File ownership + bus factor
โ โ โโโ complexity.js # Complexity timeline + churn
โ โ โโโ bugs.js # Bug archaeology (13 regex patterns)
โ โ โโโ deadcode.js # Dead code detection
โ โ โโโ dependencies.js # Co-change coupling analysis
โ โโโ ai/
โ โ โโโ briefing.js # AI codebase briefing (template + LLM)
โ โโโ utils/
โ โโโ format.js # Output formatting
โโโ test/ # 36 tests across 7 suites
โโโ .github/workflows/ci.yml # CI on Node 18/20/22
๐ Key Design Decisions
- Zero runtime dependencies for the core โ The analysis engines use only Node.js built-ins
- Boundary marker parsing โ Solves the newline-in-commit-message problem
- Co-change coupling โ Inspired by research on software evolution
- Template + LLM AI briefing โ Works without an API key, but upgrades when one is available
๐ฎ What's Next
RepoLens is just getting started. Here's what I'm planning:
- [ ] ๐ฆ Publish to npm for global installation
- [ ] ๐ Web dashboard with D3.js visualizations
- [ ] ๐ GitHub Action for automated PR analysis
- [ ] ๐ SonarQube-compatible output format
- [ ] ๐ Python SDK for programmatic access
- [ ] ๐ค Enhanced AI briefing with GitHub Models
๐ Acknowledgments
- GitHub Copilot โ For being the best pair programmer I've ever had
- DEV Community โ For hosting this amazing challenge
- GitHub โ For the platform that makes all of this possible
- The open-source ecosystem: commander, chalk, ora, cli-table3
๐ Links
- ๐ Source Code
- ๐ Report Issues
- โญ Star on GitHub
Built with โค๏ธ and a lot of โ for the GitHub Finish-Up-A-Thon Challenge
What abandoned project are YOU going to revive? Drop a comment below! ๐
Top comments (0)