Prompts used to be just text.
You wrote a few sentences, pasted them into a chat box, tweaked the wording, and moved on. If the output was not good, you tried again. Nothing else depended on it. The cost of getting it wrong was close to zero.
That phase did not last long. In The Turning Point of AI, I shared that:
AI isn’t just a tool for completing your sentences or suggesting the next line of code. We need to see it as a new way of building software.
The moment prompts moved out of chat windows and into those systems, their role changed. They were no longer throwaway text. They were reused across flows. They carried logic. They returned structured data that other parts of the system depended on.
At that point, prompts quietly became part of the system.
But the way we wrote them did not change. They were still plain text strings. Easy to write. Easy to paste. Easy to grow in the wrong direction.
Prompts as plain text, and the pain points
At first, this felt fine. A prompt was just a template string. Some variables. Maybe a bit of interpolation. It looked simple enough, and it worked for small use cases.
Pressure 1: reuse
The first pressure came from reuse.
The same prompt started powering multiple features. Copy-paste became the default strategy. Someone copied it to make a small change. Someone else added another instruction for a different flow. Over time, the prompt logic drifted.
No one could confidently say which parts were shared, which parts were safe to change, and which parts would break something else. Ownership of the structure slowly disappeared.
Nothing broke immediately. That was the dangerous part.
Pressure 2: logic
The second pressure came from logic.
As soon as prompts needed to behave differently based on context, teams usually took one of two approaches.
Approach 1: branching in code and concatenating strings
This pattern is common. The system decides which instructions to include, and the prompt is assembled step by step.
This works. The behavior is explicit. You can see exactly which rules apply in which case.
But the cost shows up as the system grows. Testability is weak because individual rules are hard to exercise in isolation. Maintainability suffers as strings grow and conditions multiply. Reuse often means copying text. A small change in one place can affect multiple flows.
Over time, reading this kind of prompt feels more like debugging than designing.
Example:
function buildPrompt(options: {
userType: "enterprise" | "standard";
environment: "prod" | "staging";
}) {
let prompt = `
Role
You are a senior QA engineer designing API test plans for production services.
`;
// persona
prompt += `
Persona
${
options.userType === "enterprise"
? "You are thorough and risk-aware. You prioritize reliability and compliance."
: "You are pragmatic and efficient. You prioritize high-signal coverage."
}
`;
// input contract
prompt += `
Input
- userType: ${options.userType}
- environment: ${options.environment}
`;
// steps
prompt += `
Steps
Step 1: Read the provided input and treat it as the contract.
Step 2: Derive test categories: happy path, validation, error handling, auth, rate limiting, idempotency.
Step 3: Generate test cases with realistic inputs and expected responses.
Step 4: Validate that every field in output strictly matches output schema.
`;
// tasks
prompt += `
Tasks
1. Generate a test plan for the given API.
2. Return only JSON that matches output schema.
`;
// constraints baseline
prompt += `
Constraints
- Do not include explanations or markdown.
- Do not output extra keys beyond output schema.
- Keep test cases safe for the given environment.
`;
// guardrails
prompt += `
Guardrails
- Do not invent endpoints that are not implied by the API name and context.
- Do not include secrets or real tokens.
- If information is missing, leave a placeholder value and continue.
`;
// conditional constraints
if (options.environment === "prod") {
prompt += `
Constraints
- Avoid destructive test cases. Prefer read-only or safely reversible operations.
`;
}
if (options.userType === "enterprise") {
prompt += `
Constraints
- Include edge cases, rate limiting, and failure scenarios.
`;
}
// output
prompt += `
Output
- Return a single JSON object.
- JSON must match output schema exactly.
- Each test case must follow the schema: { name, description, request, expectedResponse }.
- Output only valid JSON.
`;
return prompt;
}
Approach 2: delegating logic to the LLM
To avoid string concatenation, some teams push the branching logic into the prompt, describe the rules inside the prompt, and ask the model to apply them.
This looks cleaner at first. There is less code. Everything lives in one place.
But the trade-offs move elsewhere. Logic becomes implicit. Behavior depends on how the model interprets the rules. Testability drops. Debugging becomes guesswork. A small wording change can alter behavior in ways that are hard to predict or reproduce.
When this fails in production, it is often unclear why. Was a rule ignored? Was it interpreted differently? Did a small phrasing change shift the model’s behavior?
Example:
function buildPrompt(options: {
userType: "enterprise" | "standard";
environment: "prod" | "staging";
}) {
return `
Role
You are a senior QA engineer designing API test plans for production services.
Persona
Adjust your behavior based on the following rules:
- If userType is enterprise, be thorough and risk-aware. Prioritize reliability and compliance.
- If userType is standard, be pragmatic and efficient. Prioritize high-signal coverage.
Input
You will receive:
- userType: ${options.userType}
- environment: ${options.environment}
Steps
Step 1: Read the provided input and treat it as the contract.
Step 2: Derive test categories: happy path, validation, error handling, auth, rate limiting, idempotency.
Step 3: Generate test cases with realistic inputs and expected responses.
Step 4: Validate that every field in output strictly matches the output schema.
Tasks
1. Generate a test plan for the given API.
2. Return only JSON that matches the output schema.
Constraints
- Do not include explanations or markdown.
- Do not output extra keys beyond the output schema.
- Keep test cases safe for the given environment.
Apply additional rules:
- If environment is prod, avoid destructive test cases. Prefer read-only or safely reversible operations.
- If userType is enterprise, include edge cases, rate limiting, and failure scenarios.
Guardrails
- Do not invent endpoints that are not implied by the API name and context.
- Do not include secrets or real tokens.
- If information is missing, leave a placeholder value and continue.
Output
- Return a single JSON object.
- JSON must match the output schema exactly.
- Each test case must follow the schema: { name, description, request, expectedResponse }.
- Output only valid JSON.
`;
}
Some teams mix both approaches, but the underlying problems remain.
Pressure 3: structured outputs
The third pressure came from structured outputs.
Once prompts were expected to return structured data (ex: JSON) that fed directly into the next step of the system, failures became more visible. Parsing errors. Schema mismatches. Downstream crashes.
OpenAI introduced structured outputs and schema-based responses. Frameworks like the Vercel AI SDK added output validation and explicit error handling, so invalid responses fail fast before reaching the next step.
These tools solved an important part of the problem. They made failures explicit and protected downstream systems.
But they focus on validating the output. They do not address how prompts themselves are structured or evolved over time.
Introducing composable prompts with promptfmt
Composable prompts are my attempt to address this gap.
Instead of treating a prompt as one growing string, you treat it as a composition of parts. Each part has a clear responsibility. Logic is explicit. Reuse is intentional.
This idea is not new.
SQL moved from raw strings to query builders so queries could be composed safely. HTML moved from templates to components so structure and logic could scale.
I believe prompts can follow a similar path.
I built prompfmt for experimenting with this idea, the earlier prompt can be refactored without changing its behavior. What changes is the shape.
import { PromptBuilder, createCondition } from "promptfmt";
function buildPrompt(options: {
userType: "enterprise" | "standard";
environment: "prod" | "staging";
}) {
return new PromptBuilder()
.role("You are a senior QA engineer designing API test plans for production services")
.persona((params) => {
if (params.userType === "enterprise") return "You are thorough and risk-aware. You prioritize reliability and compliance.";
return "You are pragmatic and efficient. You prioritize high-signal coverage.";
})
.steps([
"Read the provided input and treat it as the contract",
"Derive test categories: happy path, validation, error handling, auth, rate limiting, idempotency",
"Generate test cases with realistic inputs and expected responses",
"Validate that every field in output strictly matches output schema",
])
.tasks([
"Generate a test plan for the given API",
"Return only JSON that matches output schema",
])
.constraints([
"Do not include explanations or markdown",
"Do not output extra keys beyond output schema",
"Keep test cases safe for the given environment",
])
.guardrails([
"Do not invent endpoints that are not implied by the API name and context",
"Do not include secrets or real tokens",
"If information is missing, leave a placeholder value and continue",
])
.constraints({
condition: createCondition(
(params) => params.environment === "prod",
"Avoid destructive test cases. Prefer read-only or safely reversible operations."
),
})
.constraints({
condition: createCondition(
(params) => params.userType === "enterprise",
"Include edge cases, rate limiting, and failure scenarios."
),
})
.output([
"Return a single JSON object",
"Each test case must include: name, description, request, expectedResponse",
])
.build(options);
}
And these are the expected outputs.
// console.log(buildPrompt({userType: "enterprise", environment: "prod"));
Role
You are a senior QA engineer designing API test plans for production services
Persona
You are thorough and risk-aware. You prioritize reliability and compliance.
Steps
Step 1: Read the provided input and treat it as the contract
Step 2: Derive test categories: happy path, validation, error handling, auth, rate limiting, idempotency
Step 3: Generate test cases with realistic inputs and expected responses
Step 4: Validate that every field in output strictly matches output schema
Tasks
1. Generate a test plan for the given API
2. Return only JSON that matches output schema
Constraints
- Do not include explanations or markdown
- Do not output extra keys beyond output schema
- Keep test cases safe for the given environment
Guardrails
- Do not invent endpoints that are not implied by the API name and context
- Do not include secrets or real tokens
- If information is missing, leave a placeholder value and continue
Environment Constraints
Avoid destructive test cases. Prefer read-only or safely reversible operations.
Constraints
Include edge cases, rate limiting, and failure scenarios.
Output
Return a single JSON object
Each test case must include: name, description, request, expectedResponse
// console.log(buildPrompt({userType: "standard", environment: "staging")));
Role
You are a senior QA engineer designing API test plans for production services
Persona
You are pragmatic and efficient. You prioritize high-signal coverage.
Steps
Step 1: Read the provided input and treat it as the contract
Step 2: Derive test categories: happy path, validation, error handling, auth, rate limiting, idempotency
Step 3: Generate test cases with realistic inputs and expected responses
Step 4: Validate that every field in output strictly matches output schema
Tasks
1. Generate a test plan for the given API
2. Return only JSON that matches output schema
Constraints
- Do not include explanations or markdown
- Do not output extra keys beyond output schema
- Keep test cases safe for the given environment
Guardrails
- Do not invent endpoints that are not implied by the API name and context
- Do not include secrets or real tokens
- If information is missing, leave a placeholder value and continue
Output
Return a single JSON object
Each test case must include: name, description, request, expectedResponse
Structure becomes visible. Logic is explicit. Each rule exists as a first-class piece instead of being buried inside text. That makes prompts easier to change and safer to evolve.
Most importantly, prompts become more testable, maintainable, and predictable.
Individual rules can be validated in isolation instead of only through end-to-end runs.
Adding a new condition does not require rewriting or copying large blocks of text.
Changes have clearer boundaries and fewer unintended side effects.
These qualities are not optional in software engineering. They are the baseline that allows systems to scale.
But there are trade-offs.
Composable prompts add abstraction. They introduce a learning curve. They do not magically solve prompt quality.
There are also cases where this approach is unnecessary. One-off prompts. Prototypes. Low-risk flows that do not return structured data.
But for prompts that sit in the middle of real systems, composability aligns better with how the rest of the codebase evolves.
Closing
Prompts are no longer just text.
They are reused. They carry logic. They return structured data that other systems depend on. That makes testing, maintainability, and clarity important.
As prompts become a stable part of systems, tooling will follow.
promptfmt is one small experiment in that direction. It is early, and it should improve and evolve with real usage and feedback.
If this perspective resonates with you, check it out and contribute ideas or improvements.
If you want to read more about software engineering in the AI era, subscribe to my blog. I will keep sharing what I learn while building systems with AI in the loop.
Top comments (0)