DEV Community

Wilson Xu
Wilson Xu

Posted on

Prompt Engineering for Developers: Patterns That Actually Work

Prompt Engineering for Developers: Patterns That Actually Work

Skip the vague advice about "being specific." Here are 12 concrete prompt patterns with working code, test harnesses, and measurable results.


Most prompt engineering content falls into two categories: marketing fluff ("just tell the AI what you want!") or academic theory that doesn't translate to production code. This article is neither.

We'll cover 12 patterns that experienced engineers actually use, with:

  • Real code examples you can run today
  • A test harness for measuring which prompts work
  • Failure modes to avoid
  • Cost/quality tradeoffs for each approach

By the end, you'll have a systematic method for building and evaluating prompts — not just a list of tips.

The Developer's Mental Model for Prompts

Think of a prompt as a function signature:

// Bad: unclear contract
const output = await claude("help me with my code");

// Good: explicit contract
const output = await claude({
  task: "Review this function for security vulnerabilities",
  input: { code: functionBody, language: "javascript" },
  format: "JSON array of {severity, description, line, fix}",
  constraints: ["Only flag real issues, not style preferences"]
});
Enter fullscreen mode Exit fullscreen mode

Every prompt has implicit or explicit: a task, input context, output format, and constraints. Making all four explicit dramatically improves consistency.

Setup: A Testable Prompt Framework

Before covering patterns, here's the evaluation harness we'll use throughout:

// prompt-eval.js
import Anthropic from '@anthropic-ai/sdk';

const client = new Anthropic();

/**
 * Run a prompt against multiple test cases and measure quality
 */
export async function evalPrompt(promptFn, testCases, {
  model = 'claude-haiku-4-5',
  judge = null,
  verbose = false,
} = {}) {
  const results = [];

  for (const testCase of testCases) {
    const start = Date.now();

    const prompt = promptFn(testCase.input);
    const response = await client.messages.create({
      model,
      max_tokens: 1024,
      messages: [{ role: 'user', content: prompt }],
    });

    const output = response.content[0].text;
    const latency = Date.now() - start;
    const tokens = response.usage.input_tokens + response.usage.output_tokens;

    // If a judge function is provided, score the output
    let score = null;
    if (judge) {
      score = await judge(testCase.input, output, testCase.expected);
    }

    results.push({
      input: testCase.input,
      output,
      expected: testCase.expected,
      latency,
      tokens,
      score,
      passed: score !== null ? score >= 0.7 : null,
    });

    if (verbose) {
      console.log(`\nInput: ${JSON.stringify(testCase.input).slice(0, 100)}`);
      console.log(`Output: ${output.slice(0, 200)}`);
      console.log(`Latency: ${latency}ms | Tokens: ${tokens} | Score: ${score}`);
    }
  }

  const avgScore = results.filter(r => r.score !== null)
    .reduce((sum, r) => sum + r.score, 0) / results.filter(r => r.score !== null).length;

  const avgTokens = results.reduce((sum, r) => sum + r.tokens, 0) / results.length;
  const avgLatency = results.reduce((sum, r) => sum + r.latency, 0) / results.length;

  return {
    results,
    summary: {
      passRate: results.filter(r => r.passed).length / results.length,
      avgScore,
      avgTokens,
      avgLatency,
      totalCost: estimateCost(results.reduce((sum, r) => sum + r.tokens, 0)),
    },
  };
}

function estimateCost(totalTokens, model = 'claude-haiku-4-5') {
  const rates = {
    'claude-haiku-4-5': 0.25 / 1_000_000,
    'claude-sonnet-4-5': 3 / 1_000_000,
    'claude-opus-4-5': 15 / 1_000_000,
  };
  return ((totalTokens * (rates[model] || 0.25 / 1_000_000)) * 100).toFixed(4);
}
Enter fullscreen mode Exit fullscreen mode

Pattern 1: Structured Output with JSON Schema

The most reliable way to get consistent, parseable output.

// pattern-structured-output.js

function classifyBugPrompt(bugReport) {
  return `Classify the following bug report and return a JSON object.

Bug report:
"""
${bugReport}
"""

Return ONLY a JSON object with this exact schema (no markdown, no explanation):
{
  "severity": "critical" | "high" | "medium" | "low",
  "category": "crash" | "performance" | "data-loss" | "ui" | "security" | "other",
  "affectedUsers": "all" | "some" | "few",
  "reproducible": true | false,
  "priority": 1-5,
  "summary": "One sentence description"
}`;
}

// Test it
const testCases = [
  {
    input: "App crashes immediately when clicking the 'Export' button for any user with more than 1000 items",
    expected: { severity: 'critical', reproducible: true },
  },
  {
    input: "The button color looks slightly off on some Samsung phones",
    expected: { severity: 'low', category: 'ui' },
  },
];

// Judge: check if required fields match expected values
async function structuredJudge(input, output, expected) {
  try {
    const parsed = JSON.parse(output);
    const matches = Object.entries(expected)
      .filter(([k, v]) => parsed[k] === v).length;
    return matches / Object.keys(expected).length;
  } catch {
    return 0; // Invalid JSON = automatic failure
  }
}

const { summary } = await evalPrompt(classifyBugPrompt, testCases, {
  judge: structuredJudge,
  verbose: true,
});
console.log('Structured output score:', summary);
Enter fullscreen mode Exit fullscreen mode

Key rules for structured output:

  1. Say "return ONLY JSON" — no markdown fence blocks
  2. Define the exact schema inline, not just in words
  3. Use union types for enums ("a" | "b" | "c")
  4. Add JSON.parse in a try/catch — always. Even with instructions, models occasionally add explanation

Pattern 2: Chain-of-Thought (CoT)

Make the model reason before answering. Dramatically improves accuracy on multi-step problems.

// pattern-cot.js

// ❌ Without CoT: model jumps to answer, often wrong
function withoutCoT(problem) {
  return `Answer this programming question: ${problem}`;
}

// ✅ With CoT: model works through it step by step
function withCoT(problem) {
  return `Answer this programming question.

Before giving your final answer, work through it step by step:

1. Identify what the question is really asking
2. List any relevant edge cases or constraints
3. Think through your approach
4. Check your reasoning for errors
5. Then give your final answer

Question: ${problem}

Begin your reasoning:`;
}

// Even more powerful: few-shot CoT
function fewShotCoT(problem) {
  return `Solve programming problems step by step.

Example 1:
Q: What's the time complexity of finding duplicates in an array of n integers using a Set?
Thinking:
- We iterate through the array once: O(n)
- For each element, we do a Set lookup/insert: O(1) average
- Total: O(n) time, O(n) space
Answer: O(n) time complexity, O(n) space complexity.

Example 2:
Q: If I have a recursive function that calls itself twice per level and goes n levels deep, what's the time complexity?
Thinking:
- Level 0: 1 call
- Level 1: 2 calls
- Level 2: 4 calls
- Level n: 2^n calls
- Total work: 2^0 + 2^1 + ... + 2^n = 2^(n+1) - 1
Answer: O(2^n) time complexity.

Now solve this:
Q: ${problem}
Thinking:`;
}
Enter fullscreen mode Exit fullscreen mode

When to use CoT:

  • Math or logic problems
  • Debugging complex code
  • Analyzing requirements with multiple constraints
  • Any task where the answer depends on multiple steps

When NOT to use CoT:

  • Simple classification tasks (adds tokens without benefit)
  • When you need JSON output (CoT and structured output conflict)
  • Latency-critical paths

Pattern 3: System Prompts for Persona and Constraints

System prompts aren't just for setting a role — they're where you encode constraints, style guides, and domain knowledge.

// pattern-system-prompts.js

// ❌ Weak system prompt
const weakSystem = "You are a helpful coding assistant.";

// ✅ Strong system prompt
const strongSystem = `You are a senior software engineer at a fintech company.

EXPERTISE:
- Node.js, TypeScript, PostgreSQL, Redis
- Financial systems: double-entry accounting, transaction processing, PCI compliance
- Security-first mindset

COMMUNICATION STYLE:
- Direct and technical — no unnecessary hedging
- Use code examples over prose explanations
- If you're unsure about a financial regulation, say so explicitly

CONSTRAINTS:
- Never suggest storing raw card numbers (use tokenization)
- Always mention PCI DSS implications for payment-related code
- Default to TypeScript unless the user specifically asks for JavaScript
- Prefer async/await over callbacks or raw Promises
- Include error handling in all code examples

OUTPUT FORMAT:
- For code: provide the complete function, not snippets
- For architecture: use ASCII diagrams for structure
- For reviews: list issues with severity (CRITICAL/HIGH/MEDIUM/LOW)`;

// Build reusable prompt builders
class PromptBuilder {
  constructor(systemPrompt) {
    this.systemPrompt = systemPrompt;
    this.fewShots = [];
  }

  addExample(userMessage, assistantResponse) {
    this.fewShots.push({ user: userMessage, assistant: assistantResponse });
    return this;
  }

  build(userMessage) {
    return {
      system: this.systemPrompt,
      messages: [
        ...this.fewShots.flatMap(({ user, assistant }) => [
          { role: 'user', content: user },
          { role: 'assistant', content: assistant },
        ]),
        { role: 'user', content: userMessage },
      ],
    };
  }

  async call(userMessage, model = 'claude-sonnet-4-5') {
    const { system, messages } = this.build(userMessage);
    return client.messages.create({
      model,
      max_tokens: 2048,
      system,
      messages,
    });
  }
}

// Usage
const codeReviewer = new PromptBuilder(strongSystem)
  .addExample(
    'Review this function:\nfunction transfer(from, to, amount) {\n  db.query(`UPDATE accounts SET balance = balance - ${amount} WHERE id = ${from}`);\n}',
    'CRITICAL: SQL Injection vulnerability. Amount and from are interpolated directly into the query. Use parameterized queries:\n```

ts\nawait db.query("UPDATE accounts SET balance = balance - $1 WHERE id = $2", [amount, from]);\n

```\nAlso missing: transaction (non-atomic), negative amount check, balance validation.'
  );

const review = await codeReviewer.call(userCode);
Enter fullscreen mode Exit fullscreen mode

Pattern 4: Few-Shot Learning

Show the model examples of what you want before asking for it. The most underused technique.

// pattern-few-shot.js

// For extracting structured data from messy text
function extractApiParams(docs) {
  return `Extract API parameters from documentation into a JSON array.

---
Example 1:
Input: "The endpoint accepts a userId (required, string, the user's ID) and limit (optional, integer, defaults to 20, max 100)"
Output: [
  {"name": "userId", "type": "string", "required": true, "description": "the user's ID"},
  {"name": "limit", "type": "integer", "required": false, "default": 20, "max": 100}
]

---
Example 2:
Input: "Pass apiKey in the Authorization header. Include page (1-based integer) and per_page (10-50, default 25)"
Output: [
  {"name": "apiKey", "type": "string", "required": true, "location": "header", "header": "Authorization"},
  {"name": "page", "type": "integer", "required": false, "min": 1},
  {"name": "per_page", "type": "integer", "required": false, "default": 25, "min": 10, "max": 50}
]

---
Now extract from:
Input: "${docs}"
Output:`;
}

// For code transformation tasks
function modernizeCode(legacyCode) {
  return `Convert legacy JavaScript to modern ES2024+ syntax.

---
Example 1:
Input:
\`\`\`js
var users = [];
for (var i = 0; i < data.length; i++) {
  if (data[i].active) {
    users.push(data[i].name);
  }
}
\`\`\`
Output:
\`\`\`js
const users = data.filter(u => u.active).map(u => u.name);
\`\`\`

---
Example 2:
Input:
\`\`\`js
function fetchUser(id, callback) {
  db.find(id, function(err, user) {
    if (err) { callback(err); return; }
    callback(null, user);
  });
}
\`\`\`
Output:
\`\`\`js
async function fetchUser(id) {
  return await db.find(id);
}
\`\`\`

---
Now convert:
\`\`\`js
${legacyCode}
\`\`\`
Output:`;
}
Enter fullscreen mode Exit fullscreen mode

Rule of thumb: Use 2-3 examples. More than 5 rarely improves results and increases token cost. Choose examples that cover edge cases, not just the happy path.

Pattern 5: Role-Based Prompting with Expertise Injection

Tell the model who it is and what it knows, not just what to do:

// pattern-role.js

// Generic (weak)
const generic = "Review this architecture diagram.";

// Role-specific (strong)
const roleSpecific = `You are a distributed systems architect with 12 years of experience.
Your specialty is high-availability systems that process 10M+ requests/day.
You've dealt with outages at scale and have strong opinions about failure modes.

Review the following architecture and identify:
1. Single points of failure
2. Scaling bottlenecks
3. Missing redundancy
4. Any design decisions that will cause problems at 10x current load

Be direct. This team is shipping to production next week.

Architecture:
${architectureDescription}`;

// Adversarial role for security review
const securityReview = `You are a penetration tester who specializes in web API security.
You think like an attacker — your job is to find vulnerabilities, not praise good code.
Previous clients have paid you $50k to find SQL injection and auth bypasses.

Review this API endpoint code with maximum skepticism:
${apiCode}

For each vulnerability found, describe exactly how an attacker would exploit it.`;
Enter fullscreen mode Exit fullscreen mode

Pattern 6: Constraint-Based Prompting

Negative constraints often matter as much as positive instructions:

// pattern-constraints.js

function writeCommitMessage(diff) {
  return `Write a git commit message for this diff.

REQUIREMENTS:
- First line: 50 characters or less, imperative mood ("Add" not "Added")
- Second line: blank
- Body (if needed): 72 char wrap, explain WHY not WHAT
- Use conventional commits format: feat/fix/refactor/docs/test/chore

DO NOT:
- Start with "This commit..."
- Use past tense
- Include file names in the subject
- Write more than 3 body paragraphs
- Use vague words like "update", "change", "fix things"

Diff:
${diff}

Commit message:`;
}

// Testing constraints are satisfied
async function testConstraints() {
  const violations = [];

  const response = await client.messages.create({
    model: 'claude-haiku-4-5',
    max_tokens: 256,
    messages: [{ role: 'user', content: writeCommitMessage(sampleDiff) }],
  });

  const message = response.content[0].text.trim();
  const lines = message.split('\n');
  const subject = lines[0];

  if (subject.length > 50) violations.push(`Subject too long: ${subject.length} chars`);
  if (/^(Added|Updated|Changed|Fixed)/i.test(subject)) violations.push('Past tense');
  if (/^This commit/i.test(subject)) violations.push('Starts with "This commit"');
  if (lines[1] && lines[1].trim() !== '') violations.push('Missing blank line after subject');

  return { message, violations };
}
Enter fullscreen mode Exit fullscreen mode

Pattern 7: Iterative Refinement (Self-Critique)

Have the model check its own work before returning:

// pattern-self-critique.js

async function generateWithReview(task) {
  // Step 1: Generate initial response
  const draft = await client.messages.create({
    model: 'claude-sonnet-4-5',
    max_tokens: 2048,
    messages: [
      { role: 'user', content: task },
    ],
  });

  const draftText = draft.content[0].text;

  // Step 2: Critique the draft
  const critique = await client.messages.create({
    model: 'claude-sonnet-4-5',
    max_tokens: 1024,
    messages: [
      { role: 'user', content: task },
      { role: 'assistant', content: draftText },
      {
        role: 'user',
        content: `Review your response above. Check for:
1. Logical errors or incorrect claims
2. Missing edge cases
3. Unclear explanations
4. Security vulnerabilities (if code)

List specific issues. If there are none, say "No issues found."`,
      },
    ],
  });

  const critiqueText = critique.content[0].text;

  // If no issues, return the draft
  if (critiqueText.toLowerCase().includes('no issues found')) {
    return draftText;
  }

  // Step 3: Revise based on critique
  const revised = await client.messages.create({
    model: 'claude-sonnet-4-5',
    max_tokens: 2048,
    messages: [
      { role: 'user', content: task },
      { role: 'assistant', content: draftText },
      { role: 'user', content: `Review your response: ${critiqueText}\n\nPlease revise.` },
    ],
  });

  return revised.content[0].text;
}
Enter fullscreen mode Exit fullscreen mode

Note: This pattern costs 3x as many tokens. Use it for high-stakes outputs (production code, important documentation) not for every call.

Pattern 8: Template Variables and Prompt Factories

For production systems, manage prompts like code:

// prompt-factory.js

class PromptTemplate {
  constructor(template) {
    this.template = template;
    this.variables = this.extractVariables(template);
  }

  extractVariables(template) {
    const matches = template.matchAll(/\{\{(\w+)\}\}/g);
    return new Set([...matches].map(m => m[1]));
  }

  render(variables) {
    // Validate all required variables are provided
    for (const varName of this.variables) {
      if (!(varName in variables)) {
        throw new Error(`Missing required variable: ${varName}`);
      }
    }

    return this.template.replace(/\{\{(\w+)\}\}/g, (_, name) => variables[name]);
  }
}

// Define prompts as templates
const PROMPTS = {
  codeReview: new PromptTemplate(`
Review this {{language}} code for a {{context}} system.

Code:
\`\`\`{{language}}
{{code}}
\`\`\`

Focus on: {{focusAreas}}

Format as JSON: {"issues": [...], "score": 1-10}
`),

  bugReport: new PromptTemplate(`
You are investigating a {{severity}} bug in {{component}}.

Error: {{errorMessage}}
Stack trace:
{{stackTrace}}

User context: {{userContext}}

Identify the root cause and suggest a fix.
`),
};

// Usage
const reviewPrompt = PROMPTS.codeReview.render({
  language: 'TypeScript',
  context: 'payment processing',
  code: userCode,
  focusAreas: 'security, error handling, transaction atomicity',
});
Enter fullscreen mode Exit fullscreen mode

Pattern 9: Parallel Prompting

Run multiple specialized prompts simultaneously for comprehensive analysis:

// pattern-parallel.js

async function comprehensiveReview(code) {
  const [securityReview, performanceReview, styleReview] = await Promise.all([
    client.messages.create({
      model: 'claude-haiku-4-5',
      max_tokens: 1024,
      system: 'You are a security expert. Only report security vulnerabilities.',
      messages: [{ role: 'user', content: `Review for security:\n\`\`\`\n${code}\n\`\`\`` }],
    }),
    client.messages.create({
      model: 'claude-haiku-4-5',
      max_tokens: 1024,
      system: 'You are a performance engineer. Only report performance issues.',
      messages: [{ role: 'user', content: `Review for performance:\n\`\`\`\n${code}\n\`\`\`` }],
    }),
    client.messages.create({
      model: 'claude-haiku-4-5',
      max_tokens: 1024,
      system: 'You are a code reviewer focused on maintainability and readability.',
      messages: [{ role: 'user', content: `Review for code quality:\n\`\`\`\n${code}\n\`\`\`` }],
    }),
  ]);

  return {
    security: securityReview.content[0].text,
    performance: performanceReview.content[0].text,
    style: styleReview.content[0].text,
  };
}
Enter fullscreen mode Exit fullscreen mode

Pattern 10: Dynamic Examples (Retrieval-Augmented Prompting)

Choose few-shot examples dynamically based on the current input:

// pattern-dynamic-examples.js
import { cosineSimilarity } from './utils.js';

// Store examples with embeddings
const exampleStore = [
  { input: 'parse CSV with quoted fields', example: '...', embedding: null },
  { input: 'handle HTTP rate limiting', example: '...', embedding: null },
  { input: 'validate email addresses', example: '...', embedding: null },
  // ... more examples
];

async function embedText(text) {
  // Use any embedding model (e.g., OpenAI, Cohere, or local)
  const response = await fetch('https://api.openai.com/v1/embeddings', {
    method: 'POST',
    headers: {
      'Authorization': `Bearer ${process.env.OPENAI_API_KEY}`,
      'Content-Type': 'application/json',
    },
    body: JSON.stringify({ model: 'text-embedding-3-small', input: text }),
  });
  const data = await response.json();
  return data.data[0].embedding;
}

async function findRelevantExamples(query, topK = 3) {
  const queryEmbedding = await embedText(query);

  return exampleStore
    .map(ex => ({ ...ex, similarity: cosineSimilarity(queryEmbedding, ex.embedding) }))
    .sort((a, b) => b.similarity - a.similarity)
    .slice(0, topK);
}

async function buildDynamicPrompt(userQuery) {
  const relevantExamples = await findRelevantExamples(userQuery);

  const examplesSection = relevantExamples
    .map(ex => `Example (similar to your request):\n${ex.example}`)
    .join('\n\n---\n\n');

  return `${examplesSection}\n\n---\n\nNow help with:\n${userQuery}`;
}
Enter fullscreen mode Exit fullscreen mode

Pattern 11: Output Validation and Retry

Never trust raw LLM output in production. Always validate, and retry with correction hints on failure:

// pattern-validation.js

async function callWithValidation(prompt, validator, maxRetries = 3) {
  let lastError = null;

  for (let attempt = 1; attempt <= maxRetries; attempt++) {
    const messages = [{ role: 'user', content: prompt }];

    // On retry, add the previous failure as context
    if (lastError && attempt > 1) {
      const lastAttempt = await getPreviousAttempt(); // from history
      messages.push({ role: 'assistant', content: lastAttempt });
      messages.push({
        role: 'user',
        content: `Your previous response failed validation: ${lastError}. Please fix it.`,
      });
    }

    const response = await client.messages.create({
      model: 'claude-haiku-4-5',
      max_tokens: 1024,
      messages,
    });

    const output = response.content[0].text;

    try {
      const validated = validator(output);
      return { output: validated, attempts: attempt };
    } catch (error) {
      lastError = error.message;
      console.warn(`Attempt ${attempt} failed: ${error.message}`);
    }
  }

  throw new Error(`Failed after ${maxRetries} attempts. Last error: ${lastError}`);
}

// Example validators
const validators = {
  json: (output) => {
    const match = output.match(/\{[\s\S]*\}|\[[\s\S]*\]/);
    if (!match) throw new Error('No JSON found in output');
    return JSON.parse(match[0]);
  },

  semver: (output) => {
    const match = output.trim().match(/^(\d+)\.(\d+)\.(\d+)$/);
    if (!match) throw new Error(`Not a valid semver: ${output.trim()}`);
    return output.trim();
  },

  nonEmpty: (output) => {
    if (!output.trim()) throw new Error('Empty response');
    return output;
  },
};

// Usage
const result = await callWithValidation(
  "What version should I bump to? Current: 1.2.3. I fixed a bug. Return ONLY the semver number.",
  validators.semver,
);
Enter fullscreen mode Exit fullscreen mode

Pattern 12: Model Selection Strategy

Different models for different tasks — this is the biggest lever for cost optimization:

// pattern-model-selection.js

const MODEL_TIERS = {
  fast: 'claude-haiku-4-5',      // $0.25/1M tokens - simple tasks
  balanced: 'claude-sonnet-4-5', // $3/1M tokens - most tasks
  powerful: 'claude-opus-4-5',   // $15/1M tokens - complex reasoning
};

function selectModel(task) {
  // Fast tier: simple classification, formatting, extraction
  if (
    task.type === 'classify' ||
    task.type === 'format' ||
    task.type === 'extract' ||
    (task.outputTokens || 0) < 200
  ) {
    return MODEL_TIERS.fast;
  }

  // Powerful tier: complex reasoning, architecture decisions
  if (
    task.type === 'architect' ||
    task.type === 'debug-complex' ||
    task.requiresExpertise === true
  ) {
    return MODEL_TIERS.powerful;
  }

  // Default: balanced for everything else
  return MODEL_TIERS.balanced;
}

// Intelligent routing example
async function routedCodeReview(code, language) {
  // First pass: quick classification (cheap)
  const classification = await client.messages.create({
    model: MODEL_TIERS.fast,
    max_tokens: 100,
    messages: [{
      role: 'user',
      content: `Rate this code's complexity from 1-5. Return just the number.\n\n${code.slice(0, 500)}`,
    }],
  });

  const complexity = parseInt(classification.content[0].text.trim());

  // Route to appropriate model
  const model = complexity >= 4 ? MODEL_TIERS.powerful : MODEL_TIERS.balanced;

  const review = await client.messages.create({
    model,
    max_tokens: 2048,
    messages: [{
      role: 'user',
      content: `Review this ${language} code:\n\`\`\`\n${code}\n\`\`\``,
    }],
  });

  return {
    model,
    complexity,
    review: review.content[0].text,
  };
}
Enter fullscreen mode Exit fullscreen mode

Building a Prompt Test Suite

Treat prompts as code. Put them in version control, write tests, run them in CI:

// prompt-tests/review-prompt.test.js
import { describe, it, expect } from 'vitest';
import { evalPrompt } from '../prompt-eval.js';
import { classifyBugPrompt } from '../prompts/classify-bug.js';

describe('Bug classification prompt', () => {
  const testCases = [
    {
      input: 'App crashes on startup for all users after the latest deploy',
      expected: { severity: 'critical', affectedUsers: 'all' },
    },
    {
      input: 'The loading spinner is slightly off-center on Firefox',
      expected: { severity: 'low', category: 'ui' },
    },
    {
      input: 'Passwords are stored in plain text in the logs',
      expected: { severity: 'critical', category: 'security' },
    },
  ];

  it('achieves >90% accuracy on severity classification', async () => {
    const { summary } = await evalPrompt(classifyBugPrompt, testCases, {
      judge: jsonFieldJudge(['severity', 'category']),
    });
    expect(summary.avgScore).toBeGreaterThan(0.9);
  }, 30_000); // 30s timeout for API calls

  it('always returns valid JSON', async () => {
    const { results } = await evalPrompt(classifyBugPrompt, testCases);
    for (const result of results) {
      expect(() => JSON.parse(result.output)).not.toThrow();
    }
  }, 30_000);
});
Enter fullscreen mode Exit fullscreen mode

The Prompt Engineering Checklist

Before shipping any prompt to production, verify:

□ Is the task unambiguous? Could different engineers interpret this differently?
□ Is the output format explicitly specified?
□ Are all constraints listed as DO/DO NOT?
□ Does it have 2-3 examples for complex tasks?
□ Is there output validation + retry logic?
□ Is it using the right model tier (cost vs. quality)?
□ Have you tested it against edge cases?
□ Is the prompt stored in version control (not hardcoded)?
□ Is temperature set explicitly (lower for structured output, higher for creative)?
□ Do you have cost monitoring in place?
Enter fullscreen mode Exit fullscreen mode

Conclusion

Prompt engineering is engineering. Apply the same rigor you'd apply to any other system:

  • Test systematically, not just by eyeballing outputs
  • Version control your prompts — they're code
  • Monitor production behavior — prompts that work in testing can fail on real data
  • Measure the cost — a 3x better prompt isn't worth it if it costs 10x more
  • Iterate based on data, not intuition

The patterns here — structured output, chain-of-thought, few-shot, self-critique, validation with retry — cover the vast majority of production use cases. Master these twelve and you'll be able to build AI features that actually work reliably.


Testing harness and all examples available at github.com/chengyixu/prompt-patterns

Top comments (0)