DEV Community

sakshsky
sakshsky

Posted on

Building repomeld: From Simple Script to Production-Ready CLI Tool

The Problem That Started It All

It was 3 AM, and I was staring at yet another ChatGPT conversation, manually copying files from my project one by one. "Here's index.js... now here's utils.js... oh, and don't forget config.js..."

I was building an AI-powered feature and needed to give the model context about my entire codebase. But copy-pasting 20+ files? Every. Single. Time.

There had to be a better way.

That's when the idea hit me: What if I could merge my entire repository into a single file with one command?

Three months later, repomeld was born โ€” and it's now helping thousands of developers prepare context for AI tools, conduct code reviews, and archive their projects.


The Journey: From 200 Lines to Production

Week 1: The MVP (Minimum Viable Product)

The first version was embarrassingly simple:

// Just 200 lines of synchronous code
function getAllFiles(dir) {
  // Read all files recursively
  // Skip node_modules
  // Concatenate them
}
Enter fullscreen mode Exit fullscreen mode

It worked... barely. It took 30 seconds to scan a medium-sized project and crashed on anything with binary files.

Week 3: Adding Real Features

I realized I was building something people actually wanted. The GitHub issues started rolling in:

  • "Can you add .gitignore support?"
  • "What about binary file detection?"
  • "Make it faster!"

So I rebuilt everything.


The Technical Deep Dive

Here's what I learned building a production-grade CLI tool:

1. Performance Optimization: The 10x Improvement

The Problem: Initial version using synchronous fs.readdirSync blocked the event loop and took forever.

The Solution: Async iteration with intelligent caching.

// Before: Blocking and slow
const files = fs.readdirSync(dirPath);
for (const file of files) {
  // Process each file...
}

// After: Async and 10x faster
const entries = await fs.readdir(currentDir, { withFileTypes: true });
await Promise.all(entries.map(async (entry) => {
  // Process concurrently
}));
Enter fullscreen mode Exit fullscreen mode

Result: Scanning 10,000 files went from 45 seconds to 3.2 seconds.

2. Binary Detection: The Tricky Part

Detecting binary files sounds simple until you realize UTF-8 text can contain null bytes and some binaries don't.

The Solution: Hybrid approach - extension blacklist + content sampling.

async function isBinaryFileFast(filePath) {
  // Cache results
  if (binaryCache.has(filePath)) return binaryCache.get(filePath);

  // Quick extension check
  const ext = path.extname(filePath).slice(1);
  if (BINARY_EXTENSIONS.has(ext)) return true;

  // Sample first 512 bytes
  const buffer = await fs.readFile(filePath);
  return buffer.includes(0); // Null byte = binary
}
Enter fullscreen mode Exit fullscreen mode

3. Cross-Platform Path Hell

Windows vs. Unix paths caused endless bugs. node_modules wouldn't ignore properly on Windows because of backslashes.

The Solution: Normalize everything.

const normalizePath = (p) => p.split(path.sep).join('/');
// Now "src\\utils\\index.js" becomes "src/utils/index.js"
Enter fullscreen mode Exit fullscreen mode

4. The Recursion Problem

Users kept accidentally including repomeld's own output files, causing infinite loops and massive file bloat.

The Solution: Hard-coded ignore + pattern matching.

// Always ignore anything starting with "repomeld"
if (entry.name.startsWith('repomeld')) continue;
Enter fullscreen mode Exit fullscreen mode

Features That Made the Difference

1. Gitignore Support (Most Requested)

Respecting .gitignore was non-negotiable. I used the ignore package:

const ig = ignore();
ig.add(fs.readFileSync('.gitignore', 'utf8'));
if (ig.ignores('node_modules/lodash/index.js')) {
  // Skip it
}
Enter fullscreen mode Exit fullscreen mode

2. Three Output Styles

Users wanted flexibility:

  • Banner: Clear visual separation with metadata
  • Markdown: Perfect for pasting into AI tools
  • Minimal: Just the code, nothing else

3. Smart Defaults with Overrides

{
  "ignore": [
    "node_modules",  // Always ignored
    "dist",         // Build output
    "package.json"  // Can be overridden with --force-include
  ]
}
Enter fullscreen mode Exit fullscreen mode

4. Auto-Numbered Backups

Never overwrite existing files:

repomeld_output.txt       # First run
repomeld_output__2.txt    # Second run
repomeld_output__3.txt    # Third run
repomeld_zips/            # Automatic zip backups
Enter fullscreen mode Exit fullscreen mode

Lessons Learned

1. Start with the CLI, Not the Library

Building the command-line interface first forced me to think about user experience from day one.

2. Test on Windows Early

Most of my early users were on Windows, but I developed on Mac. Big mistake. Add Windows to your CI pipeline immediately.

3. Feature Flags Are Your Friend

repomeld --dry-run      # Preview without writing
repomeld --no-backup    # Skip zip creation
repomeld --no-update-check  # For CI/CD
Enter fullscreen mode Exit fullscreen mode

4. Documentation Is Not Optional

My initial README was three sentences. After expanding it to 400+ lines, downloads increased 5x.


What's Next?

repomeld v4.0 is in development with:

  • Watch mode: Automatically rebuild when files change
  • Diff views: Show what changed between runs
  • Dependency graphs: Visualize file relationships
  • AI prompts: Generate optimized prompts from your codebase

Try It Yourself

npm install -g repomeld
cd your-project
repomeld
Enter fullscreen mode Exit fullscreen mode

That's it. You'll get a single file containing your entire codebase - perfect for AI context, code reviews, or archiving.


The Human Side

Building repomeld taught me that the best tools solve real problems simply. Not every CLI needs AI or blockchain or microservices. Sometimes, you just need to combine text files.

I'm currently available for freelance and full-time opportunities. If you need a developer who understands both the technical and human sides of building developer tools, let's talk.

๐Ÿ“ง susheelhbti@gmail.com


Resources


Have a suggestion for repomeld? Open an issue on GitHub. Found a bug? PRs welcome. Want to hire me? Email me.

Happy coding! ๐Ÿ”ฅ


Appendix: Architecture Diagram

repomeld/
โ”œโ”€โ”€ bin/
โ”‚   โ””โ”€โ”€ cli.js                 # CLI entry point
โ”œโ”€โ”€ src/
โ”‚   โ”œโ”€โ”€ core/
โ”‚   โ”‚   โ”œโ”€โ”€ fileScanner.js     # Async file traversal
โ”‚   โ”‚   โ”œโ”€โ”€ ignoreBuilder.js   # Gitignore parsing
โ”‚   โ”‚   โ”œโ”€โ”€ formatter.js       # Output formatting
โ”‚   โ”‚   โ””โ”€โ”€ progress.js        # Progress indicator
โ”‚   โ”œโ”€โ”€ utils/
โ”‚   โ”‚   โ”œโ”€โ”€ helpers.js         # Utilities
โ”‚   โ”‚   โ”œโ”€โ”€ constants.js       # Config
โ”‚   โ”‚   โ””โ”€โ”€ backup.js          # Zip creation
โ”‚   โ””โ”€โ”€ index.js               # Main orchestration
Enter fullscreen mode Exit fullscreen mode

Key takeaway: Clean architecture and separation of concerns made the codebase maintainable as features grew.

Top comments (0)