DEV Community

Mukunda Rao Katta
Mukunda Rao Katta

Posted on

I changed one line in my Hermes system prompt. prompt-diff showed me exactly what.

Hermes Agent Challenge Submission: Write About Hermes Agent

This is a submission for the Hermes Agent Challenge.

I have a Hermes orchestrator agent with a 600-line system prompt. Last week I made what I thought was a small change — I added a paragraph about when to delegate to a sub-agent. The model's behavior shifted noticeably. I wanted to do a post-mortem: exactly what did I change?

I had the old prompt in a variable in my code. The new one was in memory. I could not git diff them because they were Python strings. I could print them both and scroll up and down. That does not work for 600 lines.

prompt-diff is what I wanted.

Basic usage

from prompt_diff import diff_prompts, render_diff

old = [
    {"role": "system", "content": "You are a research agent.\n\nWhen you need data, call search_web.\nWhen you have enough data, call write_report."},
    {"role": "user", "content": "Research the Hermes model."},
]

new = [
    {"role": "system", "content": "You are a research agent.\n\nWhen you need data, call search_web.\nWhen you have enough data, call write_report.\nWhen the task is complex, delegate to a sub-agent using spawn_worker."},
    {"role": "user", "content": "Research the Hermes model."},
]

diff = diff_prompts(old, new)
print(render_diff(diff))
Enter fullscreen mode Exit fullscreen mode

Output:

[system] (changed)
  --- system (old)
  +++ system (new)
  @@ -1,4 +1,5 @@
   You are a research agent.

   When you need data, call search_web.
   When you have enough data, call write_report.
  +When the task is complex, delegate to a sub-agent using spawn_worker.

Enter fullscreen mode Exit fullscreen mode

That is exactly the unified diff format, per message. The [system] header tells you which message slot changed. The rest is standard + and - lines.

The data model

diff = diff_prompts(old, new)

diff.any_changed       # True
diff.changed_count     # 1
diff.added_count       # 0 — no new message slots
diff.same_count        # 1 — user message unchanged

for md in diff.messages:
    md.status          # "same" | "changed" | "added" | "removed"
    md.role            # "system", "user", "assistant"
    md.old_content     # original string
    md.new_content     # new string
    md.unified_lines   # list of diff lines (for changed messages)
Enter fullscreen mode Exit fullscreen mode

You can use the structured result to build your own report, log the diff to a JSONL trace, or run it in a test that asserts the prompt did not change between deploys.

Detecting prompt drift in tests

That last use case is one I actually use: I have a test that asserts the system prompt in prod has not changed since the last approved version. If someone edits the prompt without updating the golden snapshot, the test fails and shows exactly what changed:

def test_system_prompt_unchanged():
    from prompt_diff import diff_prompts
    diff = diff_prompts(GOLDEN_MESSAGES, get_current_messages())
    if diff.any_changed:
        from prompt_diff import render_diff
        pytest.fail(f"System prompt changed:\n{render_diff(diff)}")
Enter fullscreen mode Exit fullscreen mode

This is the kind of drift detection that catches "I fixed a typo" commits that accidentally introduce a different instruction.

Handles Anthropic content blocks

If your messages use "content": [{"type": "text", "text": "..."}] instead of a plain string, prompt-diff normalizes them before diffing. Both shapes work:

# String content (OpenAI style)
{"role": "user", "content": "hello"}

# Content block array (Anthropic style)
{"role": "user", "content": [{"type": "text", "text": "hello"}]}
Enter fullscreen mode Exit fullscreen mode

CLI

For quick one-off diffs from the command line, save your message lists as JSON files:

python3 -m prompt_diff old.json new.json
python3 -m prompt_diff old.json new.json --color --show-same
Enter fullscreen mode Exit fullscreen mode

Exit code 1 when there are differences (like diff), 0 when identical. Easy to wire into a pre-commit hook or CI check.

Context lines

The default is 3 lines of context around each change, matching standard diff tools. For very long prompts where you want more context:

diff = diff_prompts(old, new, context=10)
Enter fullscreen mode Exit fullscreen mode

Or for a minimal view that shows only the changed lines:

diff = diff_prompts(old, new, context=0)
Enter fullscreen mode Exit fullscreen mode

Technical notes

22 tests. Zero runtime dependencies — uses Python's difflib.unified_diff from the stdlib. Python 3.10+. The test suite covers identical messages, added messages, removed messages, role changes, Anthropic content block normalization, None content, color rendering, context line count, and the combined case with mixed same/changed/added messages.

Repo: https://github.com/MukundaKatta/prompt-diff

pip install prompt-diff
Enter fullscreen mode Exit fullscreen mode

Top comments (0)