Troy Magennis

Posted on Apr 23

PromptOpsKit: an open-source, repo-native way to manage prompts in AI apps

#webdev #opensource #automation #ai

I got tired of prompt & system instruction, models, tools, and input context strings scattered across my app, so I built PromptOpsKit (or see this website)

npm i promptopskit

If you build AI features into a real product, you probably already have prompt operations.

You just don’t have them in one place.

A typical feature ends up spreading behavior across:

a prompt string in one file
model settings in another
environment-specific behavior in conditionals
runtime application data injected ad hoc
repeated instructions copied across features
provider-specific request shapes mixed into app code

That works for a while.

But eventually it gets harder to review, reuse, validate, and change safely.

I wanted a repo-native way to treat prompt behavior as part of the application itself, so I built PromptOpsKit: an open-source npm library for defining prompts, model settings, context inputs, validation rules, defaults, and overrides as structured assets in the codebase.

It’s not a hosted prompt dashboard.
It’s not an eval platform.
It’s not trying to own your transport layer.

It’s a way to make prompt behavior easier to manage in the same place the rest of the app already lives: the repo.

Just want to see a demo (run at 2x i talk slowly)

The problem I kept seeing

In simple demos, prompts look easy.

You put a string in code, call a model, and move on.

In a real app, that rarely stays simple.

The prompt is only part of the behavior. You also end up dealing with things like:

model choice
environment overrides
tool definitions
shared instructions
provider-specific request shapes
application data that has to be inserted safely at runtime

Over time, the “prompt” stops being just text.

It becomes a mix of instructions, configuration, validation, and runtime behavior.

But in a lot of codebases, it still gets managed like this:

const systemPrompt = `
You are a code review assistant. Summarize pull requests concisely and clearly.

Summarize the following pull request:

${pullRequestBody}
`;

const request = {
  model: process.env.NODE_ENV === "development" ? "gpt-5.4-mini" : "gpt-5.4",
  messages: [
    { role: "system", content: systemPrompt }
  ]
};

This works at first.

But now application context is being shoved directly into the prompt with no real contract around it.

That creates a few problems:

every feature invents its own interpolation pattern
input validation is easy to forget
prompt review gets mixed up with string-building code
trimming and hardening are inconsistent
sensitive content checks are ad hoc
missing or malformed inputs often fail unclely or silently

That is the kind of mess I wanted to clean up.

What I wanted instead

I wanted the prompt asset to declare what runtime input it expects, and how that input should be validated before rendering.

In PromptOpsKit, that looks more like this:

---
id: summarizePullRequest
schema_version: 1
environments:
  dev:
    model: gpt-5.4-mini
context:
  inputs:
    - name: pull_request_body
      max_size: 8000
      trim: both
      allow_regex:
        pattern: '\S'
      deny_regex:
        pattern: '(secret|api[_-]?key|password)'
        flags: 'i'
        return_message: "A secret was detected."
---

# System instructions

You are a code review assistant. Summarize pull requests concisely and clearly.

# Prompt template

Summarize the following pull request:

{{ pull_request_body }}

# Notes

This example demonstrates input hardening with byte trimming plus structured regular expressions, including an explicit case-insensitive flag for the denylist.

And then at runtime:

const request = await openaiAdapter.renderPrompt(
  {
    path: "summarizePullRequest",
  },
  {
    environment,
    variables: {
      pull_request_body: pullRequestBody,
    },
    strict: true,
  },
);

That gives the prompt a clear runtime contract.

The prompt file declares:

the input name
its size limit
how it should be trimmed
what content is required
what content should be rejected
which environment overrides apply

And the application just provides the variable value when rendering.

That separation feels much cleaner.

The app still owns the business data.
The prompt owns the structure and validation expectations.
The renderer enforces the contract at runtime.

Why that feels better

This is more than template substitution.

It means the prompt asset can define:

what variables are expected
how they are hardened
what should fail fast
what should render differently by environment

So instead of building prompts by manually stitching raw application data into strings, you get a structured runtime boundary between the app and the prompt.

That makes prompt behavior:

easier to review
easier to reuse
easier to validate
less brittle
safer by default

That was one of the main reasons I built PromptOpsKit.

Why I wanted a repo-native approach

A lot of teams already ship software through:

Git
pull requests
CI
branches
environments
releases

That is already the operational workflow.

So for teams like that, it makes sense for prompt behavior to fit that same model.

I did not want a setup where prompt behavior lived in a separate control plane by default.

I wanted it to live in the codebase, with structure.

That means:

the prompt stays close to the app
changes are reviewable in PRs
shared defaults are explicit
environment behavior is visible
runtime input rules are versioned
the resulting payload can still be rendered cleanly for different providers

That was the goal behind PromptOpsKit.

What PromptOpsKit is

PromptOpsKit is an open-source library for authoring prompt assets in Markdown with metadata, then rendering them into provider-specific request payloads.

The idea is to keep the source format readable for developers, but structured enough to behave like a real application asset.

A prompt file can define things like:

instructions
model settings
tools
includes
environment overrides
context inputs
validation and hardening rules

So instead of treating the prompt like a loose string literal, you can treat it like a packaged behavior definition.

The shift in mindset

The main idea behind PromptOpsKit is simple:

A prompt in a production app is usually not just text.

It is a behavior definition.

It includes:

instructions
settings
tools
context inputs
validation expectations
environment-specific behavior
provider rendering concerns

Once I started thinking about prompts that way, it stopped making sense to manage them as isolated strings scattered through the app.

They needed more structure.

Not more ceremony.
Just better structure.

What I wanted it to handle

When building PromptOpsKit, I kept coming back to a few requirements.

1. Keep related behavior together

The prompt text, settings, and runtime input definitions should not be spread across random files unless there is a real reason.

2. Support shared instructions

Teams often repeat the same patterns:

tone guidance
safety guidance
formatting rules
tool usage guidance

That should be reusable.

3. Support defaults and overrides

Prompt behavior often varies by:

environment
customer tier
deployment target
experiment

Those differences should be explicit instead of buried in code branches.

4. Validate runtime inputs

If a prompt expects application context, that contract should be declared and enforced instead of left implicit.

5. Work with multiple providers

I wanted to keep the source prompt stable while still rendering request payloads for different providers.

6. Validate in CI

If a prompt asset is malformed, missing required pieces, or using invalid references, I want that to fail early.

7. Compile for production

Readable source is great during development, but production apps often benefit from compiled artifacts.

What it is not

I think this part matters for open-source trust, so here is the direct version.

PromptOpsKit is not:

a hosted prompt management SaaS
a replacement for eval frameworks
an observability product
a gateway or proxy
a transport SDK

You can still use whatever you want for:

HTTP transport
retries
auth
headers
tracing
evals
analytics

PromptOpsKit is much narrower than that.

It is the repo-native layer for organizing and rendering prompt behavior.

That narrowness is intentional.

Why I think this matters

As soon as AI features become real product features, the way teams manage prompt behavior has to mature.

Not because prompts are magical.

Because once prompts affect customer experience, pricing tiers, tool access, or production behavior, they become operationally important.

At that point, teams need more than:

multiline strings
scattered config
undocumented overrides
duplicated instruction blocks
ad hoc runtime interpolation

They need something they can:

review
validate
reuse
compile
ship
evolve safely

That is the gap I wanted to address.

Who I think this is for

PromptOpsKit is a good fit if:

your prompts already live in application code
you have more than one AI-powered feature
you reuse instructions across prompts
provider flexibility matters
prompt behavior changes by environment
application context needs to be injected safely at runtime
your team already relies on Git and CI for shipping changes

It is probably less useful if:

your main need is a hosted playground
non-technical users are the primary authors
your biggest challenge is eval orchestration rather than repo structure
prompt behavior is intentionally managed outside the app release workflow

I think it is healthy to be clear about that.

Not every tool needs to be for everyone.

Why I’m sharing it

I am sharing PromptOpsKit because I think more teams are running into this problem now.

A lot of AI applications are moving past the demo phase.

That means prompt behavior starts needing the same kind of discipline as the rest of the codebase:

clearer ownership
safer changes
less duplication
more explicit contracts
better reviewability

That is the problem space I am interested in.

PromptOpsKit is my attempt to make that workflow practical without forcing people into a separate hosted system.

The practical takeaway

Most teams do not need more prompts.

They need better structure around the prompts they already have.

For me, that means:

keep prompt behavior in the repo
define runtime inputs explicitly
validate and harden context before rendering
keep overrides visible
stop burying important behavior in string assembly code

If your team already ships AI features through repos, PRs, CI, environments, and releases, prompt behavior should probably fit that workflow too.

And if your prompts are already in Git, the next step is not moving them into a mystery box somewhere else.

It is making them manageable.

Repo

If this matches the way your team is building AI features, the repo is here:

GitHub repo link

I’d genuinely love feedback from people managing prompts in real applications:

what feels messy today
what you wish was easier to review
where your current prompt setup starts to break down
what a repo-native workflow would need to support

If nothing else, I hope this helps push the conversation a bit beyond “where do I store my prompt string?” and toward “how should prompt behavior actually be managed in production apps?”

DEV Community