DEV Community

Rahul Kashayp
Rahul Kashayp

Posted on • Originally published at github.com

I Built Git for LLM Prompts — Here is What 332 Tests Taught Me

I Built "Git for Prompts" — Here is What 332 Tests Taught Me

I was managing 50+ LLM prompts in Google Docs.

It broke my production AI 3 times in one month.

Each time, I spent hours manually testing versions to find what changed.

Sound familiar?


The Problem

Git works great for code. But prompts are different:

  • Semantic changes matter more than text diff — changing "be concise" to "be thorough" is a behavioral shift
  • Version history is scattered — Google Docs, Notion, or worse, inline comments
  • No way to query by performance — Which version had the best success rate?
  • Sharing improvements is manual — Copy-paste and hope you do not break anything

I needed version control that understands prompts, not just tracks text.


Meet PIT (Prompt Information Tracker)

pip install prompt-pit
Enter fullscreen mode Exit fullscreen mode

PIT is "Git for prompts" — semantic version control designed for LLM workflows.

1. Binary Search for Broken Versions

Your AI started giving weird answers. Which version broke it?

pit bisect start --failing-input "why is the sky blue?"
pit bisect good v1
pit bisect bad v50
Enter fullscreen mode Exit fullscreen mode

Binary search finds the culprit. Minutes, not hours.

2. Time-Travel Replay

Same input. 50 versions. Instant comparison.

pit replay run my-prompt --input "Hello" --all
Enter fullscreen mode Exit fullscreen mode

See exactly how behavior evolved. No more "it worked yesterday" mysteries.

3. Query by Behavior

Find versions that actually matter:

pit log --where "success_rate > 0.9"
pit log --where "content contains 'be concise' AND tags contains production"
Enter fullscreen mode Exit fullscreen mode

Query by metrics, not just metadata.

4. Shareable Patches

Your teammate improved a prompt. You want that improvement.

pit patch create prompt v1 v2 --output fix.patch
pit patch apply fix.patch --to my-prompt
Enter fullscreen mode Exit fullscreen mode

Like Git patches, but for prompt semantics.

5. Git-Style Hooks

Prevent bad prompts from reaching production:

pit hooks install pre-commit
# Scans for security issues before every commit
Enter fullscreen mode Exit fullscreen mode

CI/CD for prompts. Finally.

6. Dependencies

Your prompts depend on other prompts. Track it:

pit deps add shared github org/repo/prompts --version v1.0
pit deps install
Enter fullscreen mode Exit fullscreen mode

Like npm for prompts. Version-lock everything.


The Full Feature Set

Feature What It Does
Bisect Binary search to find broken versions
Replay Test same input across all versions
Patches Export/import prompt changes
Hooks Pre-commit, post-checkout automation
Bundles Package and share prompts
Query Language Search by behavior metrics
Dependencies External prompt packages
Worktrees Multiple contexts without switching
Stash Save WIP with test context
Semantic Merge Smart conflict detection

332 tests. Production-ready. Open source.


Why This Matters

Prompts are becoming critical infrastructure.

Just like we do not deploy code without version control, we should not deploy prompts without it either.

PIT brings software engineering discipline to prompt engineering:

  • Traceability (who changed what, when, why)
  • Reproducibility (checkout any version instantly)
  • Collaboration (patches, bundles, dependencies)
  • Quality (hooks, testing, metrics)

Try It

pip install prompt-pit
pit init
pit add my-prompt.md --name "my-prompt"
pit commit my-prompt --message "Initial version"
Enter fullscreen mode Exit fullscreen mode

Star it on GitHub: github.com/itisrmk/pit


What is Your Biggest Prompt Management Pain?

I built PIT to solve my own headaches.

But I am curious — what frustrates you most about managing prompts in production?

Drop a comment below 👇


PIT is free, open source (MIT), and built with Python + Rich + Typer.

Top comments (0)