I got tired of prompt & system instruction, models, tools, and input context strings scattered across my app, so I built PromptOpsKit (or see this website)
npm i promptopskit
If you build AI features into a real product, you probably already have prompt operations.
You just don’t have them in one place.
A typical feature ends up spreading behavior across:
- a prompt string in one file
- model settings in another
- environment-specific behavior in conditionals
- runtime application data injected ad hoc
- repeated instructions copied across features
- provider-specific request shapes mixed into app code
That works for a while.
But eventually it gets harder to review, reuse, validate, and change safely.
I wanted a repo-native way to treat prompt behavior as part of the application itself, so I built PromptOpsKit: an open-source npm library for defining prompts, model settings, context inputs, validation rules, defaults, and overrides as structured assets in the codebase.
It’s not a hosted prompt dashboard.
It’s not an eval platform.
It’s not trying to own your transport layer.
It’s a way to make prompt behavior easier to manage in the same place the rest of the app already lives: the repo.
Just want to see a demo (run at 2x i talk slowly)
The problem I kept seeing
In simple demos, prompts look easy.
You put a string in code, call a model, and move on.
In a real app, that rarely stays simple.
The prompt is only part of the behavior. You also end up dealing with things like:
- model choice
- environment overrides
- tool definitions
- shared instructions
- provider-specific request shapes
- application data that has to be inserted safely at runtime
Over time, the “prompt” stops being just text.
It becomes a mix of instructions, configuration, validation, and runtime behavior.
But in a lot of codebases, it still gets managed like this:
const systemPrompt = `
You are a code review assistant. Summarize pull requests concisely and clearly.
Summarize the following pull request:
${pullRequestBody}
`;
const request = {
model: process.env.NODE_ENV === "development" ? "gpt-5.4-mini" : "gpt-5.4",
messages: [
{ role: "system", content: systemPrompt }
]
};
This works at first.
But now application context is being shoved directly into the prompt with no real contract around it.
That creates a few problems:
- every feature invents its own interpolation pattern
- input validation is easy to forget
- prompt review gets mixed up with string-building code
- trimming and hardening are inconsistent
- sensitive content checks are ad hoc
- missing or malformed inputs often fail unclely or silently
That is the kind of mess I wanted to clean up.
What I wanted instead
I wanted the prompt asset to declare what runtime input it expects, and how that input should be validated before rendering.
In PromptOpsKit, that looks more like this:
---
id: summarizePullRequest
schema_version: 1
environments:
dev:
model: gpt-5.4-mini
context:
inputs:
- name: pull_request_body
max_size: 8000
trim: both
allow_regex:
pattern: '\S'
deny_regex:
pattern: '(secret|api[_-]?key|password)'
flags: 'i'
return_message: "A secret was detected."
---
# System instructions
You are a code review assistant. Summarize pull requests concisely and clearly.
# Prompt template
Summarize the following pull request:
{{ pull_request_body }}
# Notes
This example demonstrates input hardening with byte trimming plus structured regular expressions, including an explicit case-insensitive flag for the denylist.
And then at runtime:
const request = await openaiAdapter.renderPrompt(
{
path: "summarizePullRequest",
},
{
environment,
variables: {
pull_request_body: pullRequestBody,
},
strict: true,
},
);
That gives the prompt a clear runtime contract.
The prompt file declares:
- the input name
- its size limit
- how it should be trimmed
- what content is required
- what content should be rejected
- which environment overrides apply
And the application just provides the variable value when rendering.
That separation feels much cleaner.
The app still owns the business data.
The prompt owns the structure and validation expectations.
The renderer enforces the contract at runtime.
Why that feels better
This is more than template substitution.
It means the prompt asset can define:
- what variables are expected
- how they are hardened
- what should fail fast
- what should render differently by environment
So instead of building prompts by manually stitching raw application data into strings, you get a structured runtime boundary between the app and the prompt.
That makes prompt behavior:
- easier to review
- easier to reuse
- easier to validate
- less brittle
- safer by default
That was one of the main reasons I built PromptOpsKit.
Why I wanted a repo-native approach
A lot of teams already ship software through:
- Git
- pull requests
- CI
- branches
- environments
- releases
That is already the operational workflow.
So for teams like that, it makes sense for prompt behavior to fit that same model.
I did not want a setup where prompt behavior lived in a separate control plane by default.
I wanted it to live in the codebase, with structure.
That means:
- the prompt stays close to the app
- changes are reviewable in PRs
- shared defaults are explicit
- environment behavior is visible
- runtime input rules are versioned
- the resulting payload can still be rendered cleanly for different providers
That was the goal behind PromptOpsKit.
What PromptOpsKit is
PromptOpsKit is an open-source library for authoring prompt assets in Markdown with metadata, then rendering them into provider-specific request payloads.
The idea is to keep the source format readable for developers, but structured enough to behave like a real application asset.
A prompt file can define things like:
- instructions
- model settings
- tools
- includes
- environment overrides
- context inputs
- validation and hardening rules
So instead of treating the prompt like a loose string literal, you can treat it like a packaged behavior definition.
The shift in mindset
The main idea behind PromptOpsKit is simple:
A prompt in a production app is usually not just text.
It is a behavior definition.
It includes:
- instructions
- settings
- tools
- context inputs
- validation expectations
- environment-specific behavior
- provider rendering concerns
Once I started thinking about prompts that way, it stopped making sense to manage them as isolated strings scattered through the app.
They needed more structure.
Not more ceremony.
Just better structure.
What I wanted it to handle
When building PromptOpsKit, I kept coming back to a few requirements.
1. Keep related behavior together
The prompt text, settings, and runtime input definitions should not be spread across random files unless there is a real reason.
2. Support shared instructions
Teams often repeat the same patterns:
- tone guidance
- safety guidance
- formatting rules
- tool usage guidance
That should be reusable.
3. Support defaults and overrides
Prompt behavior often varies by:
- environment
- customer tier
- deployment target
- experiment
Those differences should be explicit instead of buried in code branches.
4. Validate runtime inputs
If a prompt expects application context, that contract should be declared and enforced instead of left implicit.
5. Work with multiple providers
I wanted to keep the source prompt stable while still rendering request payloads for different providers.
6. Validate in CI
If a prompt asset is malformed, missing required pieces, or using invalid references, I want that to fail early.
7. Compile for production
Readable source is great during development, but production apps often benefit from compiled artifacts.
What it is not
I think this part matters for open-source trust, so here is the direct version.
PromptOpsKit is not:
- a hosted prompt management SaaS
- a replacement for eval frameworks
- an observability product
- a gateway or proxy
- a transport SDK
You can still use whatever you want for:
- HTTP transport
- retries
- auth
- headers
- tracing
- evals
- analytics
PromptOpsKit is much narrower than that.
It is the repo-native layer for organizing and rendering prompt behavior.
That narrowness is intentional.
Why I think this matters
As soon as AI features become real product features, the way teams manage prompt behavior has to mature.
Not because prompts are magical.
Because once prompts affect customer experience, pricing tiers, tool access, or production behavior, they become operationally important.
At that point, teams need more than:
- multiline strings
- scattered config
- undocumented overrides
- duplicated instruction blocks
- ad hoc runtime interpolation
They need something they can:
- review
- validate
- reuse
- compile
- ship
- evolve safely
That is the gap I wanted to address.
Who I think this is for
PromptOpsKit is a good fit if:
- your prompts already live in application code
- you have more than one AI-powered feature
- you reuse instructions across prompts
- provider flexibility matters
- prompt behavior changes by environment
- application context needs to be injected safely at runtime
- your team already relies on Git and CI for shipping changes
It is probably less useful if:
- your main need is a hosted playground
- non-technical users are the primary authors
- your biggest challenge is eval orchestration rather than repo structure
- prompt behavior is intentionally managed outside the app release workflow
I think it is healthy to be clear about that.
Not every tool needs to be for everyone.
Why I’m sharing it
I am sharing PromptOpsKit because I think more teams are running into this problem now.
A lot of AI applications are moving past the demo phase.
That means prompt behavior starts needing the same kind of discipline as the rest of the codebase:
- clearer ownership
- safer changes
- less duplication
- more explicit contracts
- better reviewability
That is the problem space I am interested in.
PromptOpsKit is my attempt to make that workflow practical without forcing people into a separate hosted system.
The practical takeaway
Most teams do not need more prompts.
They need better structure around the prompts they already have.
For me, that means:
- keep prompt behavior in the repo
- define runtime inputs explicitly
- validate and harden context before rendering
- keep overrides visible
- stop burying important behavior in string assembly code
If your team already ships AI features through repos, PRs, CI, environments, and releases, prompt behavior should probably fit that workflow too.
And if your prompts are already in Git, the next step is not moving them into a mystery box somewhere else.
It is making them manageable.
Repo
If this matches the way your team is building AI features, the repo is here:
I’d genuinely love feedback from people managing prompts in real applications:
- what feels messy today
- what you wish was easier to review
- where your current prompt setup starts to break down
- what a repo-native workflow would need to support
If nothing else, I hope this helps push the conversation a bit beyond “where do I store my prompt string?” and toward “how should prompt behavior actually be managed in production apps?”
Top comments (0)