DEV Community

Cover image for PromptLedger v0.3 — Turning prompt history into a practical review workflow.
Ertugrul
Ertugrul

Posted on

PromptLedger v0.3 — Turning prompt history into a practical review workflow.

Devlog — Part 3

Turning prompt history into a practical review workflow.


In Part 1, I introduced PromptLedger as a deliberately small, local-first tool for treating prompts like code.

In Part 2, I added release semantics: labels, label history, and status views that made it easier to answer questions like what is in production right now?

With v0.3, the next question became harder:

Even if I can diff two prompt versions, can I review them in a way that feels closer to a real release workflow?

That is the focus of this release.

PromptLedger v0.3 adds a small but practical Prompt Review layer on top of the existing history model — while still staying local-first, SQLite-backed, and intentionally limited in scope.


Why a third part?

After the release semantics work in v0.2, the project could already answer questions like:

  • Which prompt does prod currently point to?
  • When was that label changed?
  • How does prod differ from staging?

But another gap became obvious.

A raw diff is useful, but in practice people often want a slightly higher-level review:

  • Did the prompt become stricter?
  • Did the tone change?
  • Was the output format changed from bullets to JSON?
  • Did safety or refusal wording get stronger or weaker?
  • Is this a release change or a likely regression risk?

Those are not execution questions. They are not observability questions either.

They are review questions.

So instead of adding prompt execution, external APIs, or any hosted layer, I kept the project focused and added a review workflow built entirely on top of the existing local data.


The main addition: review

The new command is:

promptledger review --id onboarding --from prod --to staging
Enter fullscreen mode Exit fullscreen mode

This compares two refs — versions or labels — and produces a structured review output that includes:

  • resolved refs and versions
  • a semantic summary
  • metadata changes
  • label context
  • warning flags
  • a few conservative notes

This is deliberately not an evaluation system. It does not score prompts. It does not call a model. It does not guess too much.

It simply makes a prompt diff easier to interpret.


From line diff to semantic summary

Traditional diffs are still useful, and PromptLedger keeps all previous diff modes.

But v0.3 adds a new summary-oriented mode:

promptledger diff --id onboarding --from 7 --to 9 --mode summary
Enter fullscreen mode Exit fullscreen mode

This produces a heuristic, rule-based semantic summary instead of a raw line diff.

The important design decision here is that the summary is:

  • local
  • deterministic
  • transparent
  • intentionally conservative

In other words: it only says something when the change looks clear enough.

Current summary categories include:

  • tone changes
  • tighter or looser constraints
  • output format changes
  • broader vs more specific prompts
  • safety wording changes
  • length requirement changes
  • refusal or policy wording changes

This is not meant to replace reading the actual prompt.
It is meant to make review faster and more structured.


Why heuristics instead of an LLM?

Because using an external model for review would push the project in exactly the wrong direction.

It would introduce:

  • network dependence
  • nondeterministic behavior
  • more configuration
  • harder testing
  • less trust in the output

PromptLedger is supposed to be inspectable.
If it says “constraints tightened”, that should come from understandable rules, not hidden inference.

That made a heuristic system the better fit.

It is not as flexible as an LLM-based reviewer, but it is much easier to reason about — and much more aligned with the philosophy of the project.


Reviews now export cleanly to markdown

Another practical gap in earlier versions was sharing review output.

Reading a diff in the terminal is fine.
Sharing it in a PR, issue, or internal document is another matter.

So v0.3 adds markdown export for reviews:

promptledger export review --id onboarding --from prod --to staging --format md --out review.md
Enter fullscreen mode Exit fullscreen mode

The exported markdown is deterministic and structured.
It includes:

  • a title
  • compared refs
  • semantic summary
  • text diff note
  • metadata changes
  • warnings
  • label information
  • a reviewer notes placeholder

That makes PromptLedger more useful in real workflows without adding any collaboration backend.

The file is still just a file.
You can paste it into GitHub, attach it to docs, or keep it locally.


Metadata changes are now first-class in reviews

Prompt text is only part of the story.

A release change may also involve metadata updates:

  • reason
  • author
  • tags
  • env
  • metrics

Earlier versions could already diff metadata, but v0.3 makes metadata changes part of the review object itself.

That matters because some changes are metadata-only.
In those cases, PromptLedger can now say that clearly instead of pretending there was meaningful prompt drift.

This is a small feature, but an important one.
It avoids overclaiming, which is one of the easiest ways to make a review tool feel unreliable.


Warning flags and likely drift hotspots

Prompt review is not just about summarizing what changed.
It is also about drawing attention to changes that deserve extra care.

v0.3 adds simple warning flags for cases such as:

  • comparing the same version to itself
  • environment changes
  • metadata-only changes
  • policy or refusal wording changes that may affect behavior drift

These warnings are not meant to be dramatic.
They are meant to make the review output more useful in practice.

For example, a wording change around refusal or safety does not automatically mean the prompt got worse — but it probably means a reviewer should read it more carefully.


The Python API now returns structured review objects

The review workflow is not just a CLI feature.

The Python API now exposes review results as structured domain objects rather than just formatted strings.

That means callers can programmatically access:

  • resolved refs
  • semantic summary items
  • metadata changes
  • warnings
  • notes
  • label context

This keeps the CLI and the API aligned while also making formatting a separate concern.

That separation turned out to be one of the cleaner changes in this version:

  • review logic lives in one place
  • rendering logic lives elsewhere
  • markdown export and terminal rendering are both built on the same review result

Small project, but still worth keeping modular.


UI update: review without write access

The Streamlit UI is still read-only.
That did not change.

What changed is that the comparison view now surfaces review information more clearly:

  • semantic summary
  • warnings
  • metadata diff
  • side-by-side prompt comparison
  • line diff

This keeps the UI aligned with the CLI review flow without turning it into an editor.

That constraint still matters.
The UI is there to inspect history, not to mutate it.


What did not change

Just as important as the new features is what was left out.

v0.3 does not add:

  • a hosted registry
  • prompt execution APIs
  • agent tooling
  • telemetry pipelines
  • tracing dashboards
  • cloud sync
  • automatic scoring
  • evaluation harnesses

There are already plenty of tools going in those directions.

PromptLedger is still trying to do one narrower thing well:
store, compare, review, and export prompt changes locally.


No schema expansion was needed

One part of this release that I particularly liked: the review workflow did not require turning the database into something more complicated.

SQLite remains the single source of truth.
The review layer is generated from existing prompt versions, labels, and metadata.

That kept the implementation smaller and the migration story simpler.

Not every useful feature needs a bigger schema.
Sometimes the better move is to extract more value from the structure that is already there.


Closing

v0.3 did not try to make PromptLedger smarter in a flashy way.
It tried to make it more reviewable.

The result is still a local tool.
Still inspectable.
Still deterministic where possible.
Still intentionally limited.

But now it is easier to answer a more realistic question:

Not just “what changed?” — but “how should I review this change before I move it forward?”

That is a better place for the project to be.


Links

PyPI: PyPI
GitHub: GitHub
LinkedIn: LinkedIn
Website: Website

Top comments (0)