This is a submission for the Hermes Agent Challenge.
My Hermes research agent started giving shorter, vaguer answers. I assumed it was a model issue. I tweaked temperatures, changed sampling params, added more context. Two days later I found the real cause: someone had edited the system prompt template to "clean it up," and the rendered output had changed.
I had no audit trail. No way to know when the prompt had changed or what it had looked like before.
That's prompt-version-pin.
The idea
Pin a prompt at the moment you decide it's correct. At runtime, verify the actual prompt matches the pin. If it drifted, you know immediately — before the agent produces a single output — instead of debugging backward from weird behavior.
One function to pin
from prompt_version_pin import pin_prompt
pin = pin_prompt("You are a research agent. Be concise.", label="research-v1")
pin.save("pins/research-agent.json")
The pin file:
{
"label": "research-v1",
"hash": "sha256:3a7f9c...",
"pinned_at": "2026-05-24T14:22:01Z",
"length": 38
}
One function to verify
from prompt_version_pin import PromptPin, verify_prompt
pin = PromptPin.load("pins/research-agent.json")
result = verify_prompt(actual_system_prompt, pin)
if not result.ok:
raise RuntimeError(str(result))
[FAIL] research-v1: prompt has drifted
expected: sha256:3a7f9c...
actual: sha256:b8c2d1...
Put that check at agent startup. If the prompt changed, the agent never runs. You get an error, not a mystery.
Works with message lists
Most Anthropic agents use a messages list, not a raw string. prompt-version-pin handles that too — and by default only pins the system message, not user turns:
from prompt_version_pin import pin_messages, verify_messages
messages = [
{"role": "system", "content": "Be helpful."},
{"role": "user", "content": "Hello"},
]
pin = pin_messages(messages, label="v1")
# Later — user message changed, system prompt is the same
modified = [
{"role": "system", "content": "Be helpful."},
{"role": "user", "content": "A completely different question"},
]
result = verify_messages(modified, pin)
assert result.ok # True — only the system message is pinned
Supports Anthropic content blocks (content: [{"type": "text", "text": "..."}]) automatically.
Normalization prevents false alarms
The library strips trailing whitespace from each line and normalizes line endings to LF before hashing. Copy-pasting from a Windows machine, a trailing space added by your editor — these don't trigger false positives. Real content changes do.
pin1 = pin_prompt("line1 \nline2", label="a") # trailing spaces
pin2 = pin_prompt("line1\r\nline2", label="b") # CRLF
pin3 = pin_prompt("line1\nline2", label="c") # clean
assert pin1.hash == pin2.hash == pin3.hash # all match
CLI for shell scripts and CI
# Pin a prompt
prompt-version-pin pin "You are a helpful assistant." \
--label my-agent \
-o pins/agent.json
# Pin from a file
cat system_prompt.txt | prompt-version-pin pin - --label my-agent -o pins/agent.json
# Verify (exit 0 = ok, exit 1 = drifted)
prompt-version-pin verify "$(cat system_prompt.txt)" pins/agent.json
The exit code makes this composable with CI checks. Add a step that verifies your agent's production prompt against the pinned version in your repo.
The full VerifyResult
When verification fails, there's more than a boolean:
@dataclass
class VerifyResult:
ok: bool
label: str
expected_hash: str # what was pinned
actual_hash: str # what you gave it
length_mismatch: bool # quick hint: did length change too?
length_mismatch is a fast sanity signal. If the hash differs but the length is the same, that's a subtle edit. If both differ, something larger changed.
Zero dependencies
Standard library only: hashlib, json, dataclasses, datetime, pathlib. No third-party packages.
pip install prompt-version-pin
Top comments (0)