Open-sourcing a private project should be simple: push to a public repo and write a README. In practice, it's terrifying.
GitGuardian's 2026 report found 29 million secrets leaked on GitHub last year, and AI-assisted commits leak credentials at 2x the baseline rate. Every .env file, every hardcoded API key, every internal domain reference, every docker-compose.yml with a plaintext database password — one careless git push away from being public.
I kept running into this every time I wanted to open-source something I'd built with Claude Code. The manual process — grep for secrets, replace internal references, check git history, write docs — was tedious, error-prone, and exactly the kind of thing that should be automated.
So I built 3 Claude Code agents that do it.
herakles-dev
/
opensource-pipeline
Safely open-source any project with Claude Code. 3-agent pipeline that strips secrets, verifies sanitization, and generates professional docs. Just say /opensource fork my-project.
opensource-pipeline
Safely open-source any project with Claude Code. A 3-agent pipeline that strips secrets, verifies sanitization, and generates professional documentation — so you can go from private repo to public GitHub in minutes.
Why
Open-sourcing a project is scary. Did you catch every API key? Every hardcoded password? Every internal domain reference? Every .env file?
This pipeline automates the boring, error-prone parts:
-
Forker agent strips secrets, replaces internal references, generates
.env.example - Sanitizer agent independently audits the fork with 30+ detection patterns (secrets, PII, internal refs, dangerous files, git history)
-
Packager agent generates
CLAUDE.md,setup.sh,README.md,LICENSE,CONTRIBUTING.md, and GitHub issue templates
The sanitizer is paranoid by design — false positives are acceptable, false negatives are not.
Quick Start
git clone https://github.com/herakles-dev/opensource-pipeline.git
cd opensource-pipeline
./setup.sh
That's it. The installer copies the skill and agents into your ~/.claude/ directory.
Then open Claude Code in any project:
cd…How it works
One command:
/opensource fork my-project
Behind the scenes, 3 agents chain together in sequence. Each is a markdown file — no runtime, no dependencies, no Docker. Claude Code reads the instructions and follows the protocol.
Stage 1: The Forker
Copies your project, then hunts for secrets using 20 regex patterns:
- AWS credentials (
AKIA*,aws_secret_access_key) - GitHub tokens (
ghp_*,ghs_*,github_pat_*) - Google OAuth (
GOCSPX-*) - JWT tokens (
eyJ*) - Private keys (
-----BEGIN RSA PRIVATE KEY-----) - Database connection strings (postgres, mysql, mongodb, redis URLs with credentials)
- Slack webhooks, SendGrid keys, Mailgun keys
- High-entropy strings in config files
Every secret found gets extracted to a .env.example with a placeholder. Internal references — your domains, absolute paths, IP addresses, Docker network names, usernames — are replaced with configurable placeholders.
The forker never removes functionality. It parameterizes everything so the project still runs after someone does cp .env.example .env and fills in their values.
Stage 2: The Sanitizer
This is where the design gets interesting.
The sanitizer doesn't trust the forker. It's a completely independent, read-only auditor that re-scans the entire fork from scratch across 6 categories:
| Category | Severity | What it checks |
|---|---|---|
| Secrets | CRITICAL | All 20 regex patterns, re-applied independently |
| PII | CRITICAL | Personal emails, phone numbers, private IPs, SSH strings |
| Internal references | CRITICAL | Custom domains, home directory paths, secret file refs |
| Dangerous files | CRITICAL |
.env, .pem, .key, credentials.json, session state |
| Config completeness | WARNING | Every env var in code has a matching .env.example entry |
| Git history | CRITICAL | Secrets in past commits, clean single-commit history |
A single critical finding blocks release. The verdict is PASS, FAIL, or PASS WITH WARNINGS — no grey area.
The sanitizer can report. It cannot fix. That separation of concerns is intentional. If the forker and sanitizer were the same agent, it would silently "fix" things it found — and you'd never know what it missed because it was also the one checking.
Stage 3: The Packager
Detects your tech stack (package.json, requirements.txt, Cargo.toml, go.mod, docker-compose.yml) and generates:
- CLAUDE.md — so anyone who clones your repo and opens Claude Code can be productive immediately. Commands, architecture, key files, configuration — the operator's manual for Claude.
-
setup.sh — one-command bootstrap. Checks prerequisites, copies
.env.example, installs dependencies. - README.md — features, quick start, prerequisites, Docker instructions, "Using with Claude Code" section.
- LICENSE, CONTRIBUTING.md, GitHub issue templates
The architecture is 4 markdown files
skills/opensource/SKILL.md # Orchestrator — routes commands, chains agents
agents/opensource-forker.md # Stage 1: Copy, strip, replace, .env.example
agents/opensource-sanitizer.md # Stage 2: Independent read-only audit
agents/opensource-packager.md # Stage 3: Generate CLAUDE.md, setup.sh, README
Total: 1,506 lines. No package.json. No requirements.txt. No build step. Each agent is a .md file with YAML frontmatter (name, description, model) and a body of natural language instructions that Claude Code follows.
The "code" is English.
Installation
git clone https://github.com/herakles-dev/opensource-pipeline.git
cd opensource-pipeline
./setup.sh # Copies 4 files into ~/.claude/
Then open Claude Code in any project:
cd ~/my-private-project
claude
# Say: /opensource fork my-project
You can also say "open source this project" or "make this public" — the skill triggers on natural language too.
The adversarial review
Before publishing this repo, I ran 3 review agents against it in parallel:
- Sanitizer audit — scanned for any leaked secrets, PII, internal references
- Independence review — checked if the project works on a fresh machine with zero knowledge of my platform
- Functional simulation — traced the entire setup.sh and skill execution flow as a new user
They found 12 issues: platform-specific metadata in agent frontmatter, a path resolution bug, missing rsync check in setup.sh, personal name in git commit author. All fixed before push.
The pipeline open-sourced itself.
What I learned
1. Agents are just markdown.
This was the biggest revelation. 1,506 lines across 4 files. No packages, no runtime, no build. Claude Code reads the markdown and follows the protocol. If you can write clear instructions for a human, you can write an agent.
The YAML frontmatter is 4 lines:
---
name: opensource-sanitizer
description: "Verify open-source fork is fully sanitized..."
model: sonnet
color: red
---
Everything below it is the agent's brain — written in plain English.
2. Zero trust between agents is worth the redundancy.
The sanitizer re-does the forker's work. In a traditional system, that's waste. In a security-critical pipeline, it's the feature. The forker's job is to transform. The sanitizer's job is to verify. If one agent does both, you've created a system that can silently paper over its own mistakes.
3. The paranoid option is the right default.
We tuned every detection threshold toward false positives. The sanitizer flags anything that looks remotely suspicious. A false positive is an annoying warning you dismiss. A false negative is a secret on GitHub. The asymmetry is extreme — always choose paranoia.
4. CLAUDE.md is the highest-leverage file in any repo.
Not README (that's for humans browsing GitHub). Not .env.example (that's for configuration). CLAUDE.md is the file that makes Claude Code productive in your project immediately. The packager generates one automatically, and it's often the single most useful artifact from the pipeline.
Contributing
The repo has 5 open issues tagged "good first issue":
- Add detection patterns for Azure, GCP, Stripe, OpenAI, Anthropic API keys
- Add Rust, Go, Java/Kotlin stack detection to the packager
- Monorepo support
- GitHub Actions CI for pattern testing
- Community feedback thread
The easiest contribution is adding a regex pattern to the sanitizer. Find a secret type we don't detect, write the pattern, submit a PR.
Repo: herakles-dev/opensource-pipeline
MIT license. The whole thing is markdown. PRs welcome.
Top comments (0)