This is a submission for the Hermes Agent Challenge.
I have a Hermes orchestrator agent with a 600-line system prompt. Last week I made what I thought was a small change — I added a paragraph about when to delegate to a sub-agent. The model's behavior shifted noticeably. I wanted to do a post-mortem: exactly what did I change?
I had the old prompt in a variable in my code. The new one was in memory. I could not git diff them because they were Python strings. I could print them both and scroll up and down. That does not work for 600 lines.
prompt-diff is what I wanted.
Basic usage
from prompt_diff import diff_prompts, render_diff
old = [
{"role": "system", "content": "You are a research agent.\n\nWhen you need data, call search_web.\nWhen you have enough data, call write_report."},
{"role": "user", "content": "Research the Hermes model."},
]
new = [
{"role": "system", "content": "You are a research agent.\n\nWhen you need data, call search_web.\nWhen you have enough data, call write_report.\nWhen the task is complex, delegate to a sub-agent using spawn_worker."},
{"role": "user", "content": "Research the Hermes model."},
]
diff = diff_prompts(old, new)
print(render_diff(diff))
Output:
[system] (changed)
--- system (old)
+++ system (new)
@@ -1,4 +1,5 @@
You are a research agent.
When you need data, call search_web.
When you have enough data, call write_report.
+When the task is complex, delegate to a sub-agent using spawn_worker.
That is exactly the unified diff format, per message. The [system] header tells you which message slot changed. The rest is standard + and - lines.
The data model
diff = diff_prompts(old, new)
diff.any_changed # True
diff.changed_count # 1
diff.added_count # 0 — no new message slots
diff.same_count # 1 — user message unchanged
for md in diff.messages:
md.status # "same" | "changed" | "added" | "removed"
md.role # "system", "user", "assistant"
md.old_content # original string
md.new_content # new string
md.unified_lines # list of diff lines (for changed messages)
You can use the structured result to build your own report, log the diff to a JSONL trace, or run it in a test that asserts the prompt did not change between deploys.
Detecting prompt drift in tests
That last use case is one I actually use: I have a test that asserts the system prompt in prod has not changed since the last approved version. If someone edits the prompt without updating the golden snapshot, the test fails and shows exactly what changed:
def test_system_prompt_unchanged():
from prompt_diff import diff_prompts
diff = diff_prompts(GOLDEN_MESSAGES, get_current_messages())
if diff.any_changed:
from prompt_diff import render_diff
pytest.fail(f"System prompt changed:\n{render_diff(diff)}")
This is the kind of drift detection that catches "I fixed a typo" commits that accidentally introduce a different instruction.
Handles Anthropic content blocks
If your messages use "content": [{"type": "text", "text": "..."}] instead of a plain string, prompt-diff normalizes them before diffing. Both shapes work:
# String content (OpenAI style)
{"role": "user", "content": "hello"}
# Content block array (Anthropic style)
{"role": "user", "content": [{"type": "text", "text": "hello"}]}
CLI
For quick one-off diffs from the command line, save your message lists as JSON files:
python3 -m prompt_diff old.json new.json
python3 -m prompt_diff old.json new.json --color --show-same
Exit code 1 when there are differences (like diff), 0 when identical. Easy to wire into a pre-commit hook or CI check.
Context lines
The default is 3 lines of context around each change, matching standard diff tools. For very long prompts where you want more context:
diff = diff_prompts(old, new, context=10)
Or for a minimal view that shows only the changed lines:
diff = diff_prompts(old, new, context=0)
Technical notes
22 tests. Zero runtime dependencies — uses Python's difflib.unified_diff from the stdlib. Python 3.10+. The test suite covers identical messages, added messages, removed messages, role changes, Anthropic content block normalization, None content, color rendering, context line count, and the combined case with mixed same/changed/added messages.
Repo: https://github.com/MukundaKatta/prompt-diff
pip install prompt-diff
Top comments (0)