Ertugrul

Posted on Mar 28

PromptLedger v0.3 — Turning prompt history into a practical review workflow.

#promptengineering #python #opensource #showdev

Devlog — Part 3

Turning prompt history into a practical review workflow.

In Part 1, I introduced PromptLedger as a deliberately small, local-first tool for treating prompts like code.

In Part 2, I added release semantics: labels, label history, and status views that made it easier to answer questions like what is in production right now?

With v0.3, the next question became harder:

Even if I can diff two prompt versions, can I review them in a way that feels closer to a real release workflow?

That is the focus of this release.

PromptLedger v0.3 adds a small but practical Prompt Review layer on top of the existing history model — while still staying local-first, SQLite-backed, and intentionally limited in scope.

Why a third part?

After the release semantics work in v0.2, the project could already answer questions like:

Which prompt does prod currently point to?
When was that label changed?
How does prod differ from staging?

But another gap became obvious.

A raw diff is useful, but in practice people often want a slightly higher-level review:

Did the prompt become stricter?
Did the tone change?
Was the output format changed from bullets to JSON?
Did safety or refusal wording get stronger or weaker?
Is this a release change or a likely regression risk?

Those are not execution questions. They are not observability questions either.

They are review questions.

So instead of adding prompt execution, external APIs, or any hosted layer, I kept the project focused and added a review workflow built entirely on top of the existing local data.

The main addition: `review`

The new command is:

promptledger review --id onboarding --from prod --to staging

This compares two refs — versions or labels — and produces a structured review output that includes:

resolved refs and versions
a semantic summary
metadata changes
label context
warning flags
a few conservative notes

This is deliberately not an evaluation system. It does not score prompts. It does not call a model. It does not guess too much.

It simply makes a prompt diff easier to interpret.

From line diff to semantic summary

Traditional diffs are still useful, and PromptLedger keeps all previous diff modes.

But v0.3 adds a new summary-oriented mode:

promptledger diff --id onboarding --from 7 --to 9 --mode summary

This produces a heuristic, rule-based semantic summary instead of a raw line diff.

The important design decision here is that the summary is:

local
deterministic
transparent
intentionally conservative

In other words: it only says something when the change looks clear enough.

Current summary categories include:

tone changes
tighter or looser constraints
output format changes
broader vs more specific prompts
safety wording changes
length requirement changes
refusal or policy wording changes

This is not meant to replace reading the actual prompt.
It is meant to make review faster and more structured.

Why heuristics instead of an LLM?

Because using an external model for review would push the project in exactly the wrong direction.

It would introduce:

network dependence
nondeterministic behavior
more configuration
harder testing
less trust in the output

PromptLedger is supposed to be inspectable.
If it says “constraints tightened”, that should come from understandable rules, not hidden inference.

That made a heuristic system the better fit.

It is not as flexible as an LLM-based reviewer, but it is much easier to reason about — and much more aligned with the philosophy of the project.

Reviews now export cleanly to markdown

Another practical gap in earlier versions was sharing review output.

Reading a diff in the terminal is fine.
Sharing it in a PR, issue, or internal document is another matter.

So v0.3 adds markdown export for reviews:

promptledger export review --id onboarding --from prod --to staging --format md --out review.md

The exported markdown is deterministic and structured.
It includes:

a title
compared refs
semantic summary
text diff note
metadata changes
warnings
label information
a reviewer notes placeholder

That makes PromptLedger more useful in real workflows without adding any collaboration backend.

The file is still just a file.
You can paste it into GitHub, attach it to docs, or keep it locally.

Metadata changes are now first-class in reviews

Prompt text is only part of the story.

A release change may also involve metadata updates:

reason
author
tags
env
metrics

Earlier versions could already diff metadata, but v0.3 makes metadata changes part of the review object itself.

That matters because some changes are metadata-only.
In those cases, PromptLedger can now say that clearly instead of pretending there was meaningful prompt drift.

This is a small feature, but an important one.
It avoids overclaiming, which is one of the easiest ways to make a review tool feel unreliable.

Warning flags and likely drift hotspots

Prompt review is not just about summarizing what changed.
It is also about drawing attention to changes that deserve extra care.

v0.3 adds simple warning flags for cases such as:

comparing the same version to itself
environment changes
metadata-only changes
policy or refusal wording changes that may affect behavior drift

These warnings are not meant to be dramatic.
They are meant to make the review output more useful in practice.

For example, a wording change around refusal or safety does not automatically mean the prompt got worse — but it probably means a reviewer should read it more carefully.

The Python API now returns structured review objects

The review workflow is not just a CLI feature.

The Python API now exposes review results as structured domain objects rather than just formatted strings.

That means callers can programmatically access:

resolved refs
semantic summary items
metadata changes
warnings
notes
label context

This keeps the CLI and the API aligned while also making formatting a separate concern.

That separation turned out to be one of the cleaner changes in this version:

review logic lives in one place
rendering logic lives elsewhere
markdown export and terminal rendering are both built on the same review result

Small project, but still worth keeping modular.

UI update: review without write access

The Streamlit UI is still read-only.
That did not change.

What changed is that the comparison view now surfaces review information more clearly:

semantic summary
warnings
metadata diff
side-by-side prompt comparison
line diff

This keeps the UI aligned with the CLI review flow without turning it into an editor.

That constraint still matters.
The UI is there to inspect history, not to mutate it.

What did not change

Just as important as the new features is what was left out.

v0.3 does not add:

a hosted registry
prompt execution APIs
agent tooling
telemetry pipelines
tracing dashboards
cloud sync
automatic scoring
evaluation harnesses

There are already plenty of tools going in those directions.

PromptLedger is still trying to do one narrower thing well:
store, compare, review, and export prompt changes locally.

No schema expansion was needed

One part of this release that I particularly liked: the review workflow did not require turning the database into something more complicated.

SQLite remains the single source of truth.
The review layer is generated from existing prompt versions, labels, and metadata.

That kept the implementation smaller and the migration story simpler.

Not every useful feature needs a bigger schema.
Sometimes the better move is to extract more value from the structure that is already there.

Closing

v0.3 did not try to make PromptLedger smarter in a flashy way.
It tried to make it more reviewable.

The result is still a local tool.
Still inspectable.
Still deterministic where possible.
Still intentionally limited.

But now it is easier to answer a more realistic question:

Not just “what changed?” — but “how should I review this change before I move it forward?”

That is a better place for the project to be.

Links

PyPI: PyPI
GitHub: GitHub
LinkedIn: LinkedIn
Website: Website

DEV Community

PromptLedger v0.3 — Turning prompt history into a practical review workflow.

Why a third part?

The main addition: `review`

From line diff to semantic summary

Why heuristics instead of an LLM?

Reviews now export cleanly to markdown

Metadata changes are now first-class in reviews

Warning flags and likely drift hotspots

The Python API now returns structured review objects

UI update: review without write access

What did not change

No schema expansion was needed

Closing

Links

Top comments (0)

Why a third part?

The main addition: review

From line diff to semantic summary

Why heuristics instead of an LLM?

Reviews now export cleanly to markdown

Metadata changes are now first-class in reviews

Warning flags and likely drift hotspots

The Python API now returns structured review objects

UI update: review without write access

What did not change

No schema expansion was needed

Closing

Links

The main addition: `review`