Nova Elvaris

Posted on Mar 6

The Prompt Registry Pattern: Treat Prompts Like Stable APIs

#productivity

If you’ve ever built a little “AI helper” script (or even just copy/pasted prompts in ChatGPT), you’ve probably felt this pain:

A prompt works great on Tuesday.
You tweak one sentence on Wednesday.
On Friday… everything behaves differently and you can’t explain why.

We version code, schemas, and APIs. But we often treat prompts like disposable sticky notes.

Here’s a workflow upgrade that pays off fast:

The Prompt Registry Pattern

Treat prompts like stable APIs. Give them names, versions, acceptance criteria, and a changelog. Then have your app (or your team) reference prompts by ID instead of by “whatever text is in someone’s clipboard today”.

The goal isn’t bureaucracy. The goal is to make prompt changes auditable, reviewable, and reversible.

This pattern is especially useful when:

prompts are used by more than one person
prompts run in production (agents, automations, support bots)
the output format matters (JSON, tables, diffs)
you want to run A/B tests or safely iterate

What a “registry” looks like

You don’t need a database. Start with a folder in your repo.

prompts/
  README.md
  registry.yml
  bug_triage/
    v1.md
    v2.md
  code_review/
    v1.md

Two small rules:

Each prompt has an ID (e.g. bug_triage).
Each prompt has versions (e.g. v1, v2).

Then add one tiny index file so humans and tooling can discover what exists.

# prompts/registry.yml
bug_triage:
  latest: v2
  owner: "platform"
  tags: ["support", "json"]
  description: "Turns user bug reports into a minimal reproduction plan"

code_review:
  latest: v1
  owner: "devex"
  tags: ["diff", "quality"]
  description: "Reviews code changes and returns a risk-focused checklist"

Now your automation can say: “use bug_triage@latest”, or “use bug_triage@v1 for the old behavior”.

A prompt file template that scales

The trick is to split the prompt into metadata and content.

Here’s a simple Markdown format that stays readable in PRs:

---
id: bug_triage
version: v2
status: active
input_contract:
  - user_report: string
output_contract:
  - summary: string
  - suspected_area: string
  - repro_steps: string[]
  - questions: string[]
acceptance_criteria:
  - "Output is valid JSON"
  - "repro_steps is a list of imperative steps"
  - "questions are actionable and minimal"
changelog:
  - "v2: added explicit JSON schema + reduced verbosity"
---

You are a senior support engineer. Your job is to turn a raw user bug report into a minimal reproduction plan.

Constraints:
- Return **only JSON**.
- Do not invent stack traces.
- If information is missing, ask questions instead of guessing.

JSON schema:
{
  "summary": "...",
  "suspected_area": "...",
  "repro_steps": ["..."],
  "questions": ["..."]
}

User report:
{{user_report}}

Why this helps:

The frontmatter can be parsed by tooling.
The body stays friendly for humans.
The acceptance criteria makes reviews objective (“did we break JSON?”).

Calling prompts by ID (not by paste)

In code, load prompts like you load templates.

Pseudo-code:

import { loadPrompt } from "./prompt-registry";

const prompt = loadPrompt("bug_triage", { version: "latest" });
const text = prompt.render({ user_report });

const response = await llm.generate({
  model: "...",
  temperature: 0.2,
  prompt: text,
});

Important detail: log the resolved version.

If latest resolves to v2, store that alongside the run so you can reproduce behavior later.

The minimal “prompt CI” that prevents silent breakage

Once prompts are in files, you can test them.

A lightweight setup:

A small test corpus (real-ish inputs)
A runner that executes prompts against those inputs
A validator that checks the output contract

Example test case:

{
  "prompt": "bug_triage",
  "version": "v2",
  "input": {
    "user_report": "App crashes when I click Save. It started after updating yesterday. Windows 11."
  },
  "assert": {
    "json": true,
    "hasKeys": ["summary", "suspected_area", "repro_steps", "questions"],
    "maxQuestions": 5
  }
}

Your validator can be simple (JSON parse + key presence + basic length checks). You’re not proving the prompt is “perfect”; you’re preventing obvious regressions.

This is where the registry shines: prompt changes become reviewable diffs with a test signal.

When to bump a prompt version

Treat versions like you treat APIs:

Patch (edit in place) when you fix typos or add clarity without changing outputs.
Minor (new version) when you change constraints, output fields, or tone.
Major (new ID or a breaking “v3”) when downstream consumers will likely break.

A practical rule of thumb:

If you’d need to tell a teammate “heads up, results will look different”, bump the version.

Bonus: keep a “latest” and a “pinned” lane

Teams often want both:

Pinned prompts for production workflows (bug_triage@v2)
Latest prompts for exploration (bug_triage@latest)

That lets you iterate quickly without accidentally changing production behavior.

Start small

You can adopt this pattern in an hour:

create prompts/
add 2–3 prompts you already reuse
assign IDs + versions
write one acceptance criterion per prompt

In my experience, the first time you fix a bug by saying “oh, that run used v1 but today we’re on v2”… you won’t go back.

If you try it, I’d love to hear what structure you picked (folder-per-prompt, YAML index, JSON index, whatever fits your stack).

DEV Community