Rahul Kashayp

Posted on Feb 5 • Originally published at github.com

I Built Git for LLM Prompts — Here is What 332 Tests Taught Me

#ai #machinelearning #python #opensource

I Built "Git for Prompts" — Here is What 332 Tests Taught Me

I was managing 50+ LLM prompts in Google Docs.

It broke my production AI 3 times in one month.

Each time, I spent hours manually testing versions to find what changed.

Sound familiar?

The Problem

Git works great for code. But prompts are different:

Semantic changes matter more than text diff — changing "be concise" to "be thorough" is a behavioral shift
Version history is scattered — Google Docs, Notion, or worse, inline comments
No way to query by performance — Which version had the best success rate?
Sharing improvements is manual — Copy-paste and hope you do not break anything

I needed version control that understands prompts, not just tracks text.

Meet PIT (Prompt Information Tracker)

pip install prompt-pit

PIT is "Git for prompts" — semantic version control designed for LLM workflows.

1. Binary Search for Broken Versions

Your AI started giving weird answers. Which version broke it?

pit bisect start --failing-input "why is the sky blue?"
pit bisect good v1
pit bisect bad v50

Binary search finds the culprit. Minutes, not hours.

2. Time-Travel Replay

Same input. 50 versions. Instant comparison.

pit replay run my-prompt --input "Hello" --all

See exactly how behavior evolved. No more "it worked yesterday" mysteries.

3. Query by Behavior

Find versions that actually matter:

pit log --where "success_rate > 0.9"
pit log --where "content contains 'be concise' AND tags contains production"

Query by metrics, not just metadata.

4. Shareable Patches

Your teammate improved a prompt. You want that improvement.

pit patch create prompt v1 v2 --output fix.patch
pit patch apply fix.patch --to my-prompt

Like Git patches, but for prompt semantics.

5. Git-Style Hooks

Prevent bad prompts from reaching production:

pit hooks install pre-commit
# Scans for security issues before every commit

CI/CD for prompts. Finally.

6. Dependencies

Your prompts depend on other prompts. Track it:

pit deps add shared github org/repo/prompts --version v1.0
pit deps install

Like npm for prompts. Version-lock everything.

The Full Feature Set

Feature	What It Does
Bisect	Binary search to find broken versions
Replay	Test same input across all versions
Patches	Export/import prompt changes
Hooks	Pre-commit, post-checkout automation
Bundles	Package and share prompts
Query Language	Search by behavior metrics
Dependencies	External prompt packages
Worktrees	Multiple contexts without switching
Stash	Save WIP with test context
Semantic Merge	Smart conflict detection

332 tests. Production-ready. Open source.

Why This Matters

Prompts are becoming critical infrastructure.

Just like we do not deploy code without version control, we should not deploy prompts without it either.

PIT brings software engineering discipline to prompt engineering:

Traceability (who changed what, when, why)
Reproducibility (checkout any version instantly)
Collaboration (patches, bundles, dependencies)
Quality (hooks, testing, metrics)

Try It

pip install prompt-pit
pit init
pit add my-prompt.md --name "my-prompt"
pit commit my-prompt --message "Initial version"

⭐ Star it on GitHub: github.com/itisrmk/pit

What is Your Biggest Prompt Management Pain?

I built PIT to solve my own headaches.

But I am curious — what frustrates you most about managing prompts in production?

Drop a comment below 👇

PIT is free, open source (MIT), and built with Python + Rich + Typer.

DEV Community