This article was originally published on AI Study Room. For the full version with working code examples and related articles, visit the original post.
Prompt Management: Versioning, Testing, Collaboration, Deployment
Introduction
Prompts are the primary interface for controlling LLM behavior, yet most teams manage them as copy-pasted text files or hardcoded strings in source code. As AI applications grow, prompts need the same rigor as application code: versioning, testing, review, staging, and deployment pipelines. This article covers the tools and workflows for professional prompt management.
Prompt as Code
Store prompts in a structured, version-controlled format:
# prompts/summarization.yaml
name: document_summarizer
version: 2.3.0
model: claude-sonnet-4-20260512
parameters:
temperature: 0.3
max_tokens: 1024
system_prompt: |
You are a technical document summarizer. Follow these rules:
1. Extract the core thesis and key supporting points
2. Preserve technical accuracy - do not simplify concepts
3. Maintain the original document's structure
4. Output in the requested format
5. Never add information not present in the source
user_template: |
Document: {document_text}
Format: {output_format}
Max length: {max_length} words
Summary:
tests:
- input:
document_text: "Kubernetes is a container orchestration platform..."
output_format: bullet_points
max_length: 100
expected_output_contains: ["container orchestration", "pods"]
min_length: 50
max_length: 150
Prompt Registry
A central registry stores all prompt versions with metadata:
import hashlib
import yaml
from datetime import datetime
class PromptRegistry:
def __init__(self, storage_backend):
self.storage = storage_backend
def register_prompt(self, name: str, prompt_data: dict) -> str:
version = prompt_data.get("version", "1.0.0")
prompt_hash = hashlib.sha256(yaml.dump(prompt_data).encode()).hexdigest()[:12]
entry = {
"name": name,
"version": version,
"hash": prompt_hash,
"prompt": prompt_data,
"created_at": datetime.now().isoformat(),
"status": "draft",
}
self.storage.save(f"prompts/{name}/{version}", entry)
return prompt_hash
def get_prompt(self, name: str, version: str = "latest") -> dict:
if version == "latest":
versions = self.storage.list(f"prompts/{name}")
version = sorted(versions)[-1]
return self.storage.load(f"prompts/{name}/{version}")
def promote_to_production(self, name: str, version: str):
entry = self.storage.load(f"prompts/{name}/{version}")
entry["status"] = "production"
entry["promoted_at"] = datetime.now().isoformat()
self.storage.save(f"prompts/{name}/{version}", entry)
def diff(self, name: str, version_a: str, version_b: str) -> str:
prompt_a = self.get_prompt(name, version_a)["prompt"]
prompt_b = self.get_prompt(name, version_b)["prompt"]
return self._compute_diff(prompt_a, prompt_b)
Automated Prompt Testing
Test prompts against a suite of evaluation cases:
class PromptTester:
def __init__(self, llm_fn):
self.llm = llm_fn
def run_tests(self, prompt_entry: dict) -> dict:
prompt_data = prompt_entry["prompt"]
tests = prompt_data.get("tests", [])
results = {"passed": 0, "failed": 0, "details": []}
for test in tests:
try:
result = self._run_single_test(prompt_data, test)
results["details"].append(result)
if result["passed"]:
results["passed"] += 1
else:
results["failed"] += 1
except Exception as e:
results["failed"] += 1
results["details"].append({
"test": test,
"passed": False,
"error": str(e),
})
results["pass_rate"] = results["passed"] / len(tests) if tests else 1.0
return results
def _run_single_test(self, prompt_data: dict, test: dict) -> dict:
# Build the prompt
system = prompt_data.get("system_prompt", "")
template = prompt_data.get("user_template", "")
inputs = test.get("input", {})
full_prompt = template.format(**inputs) if inputs else template
# Run the model
response = self.llm(system, full_prompt, prompt_data.get("parameters", {}))
# Check assertions
failures = []
if "expected_output_contains" in test:
Read the full article on AI Study Room for complete code examples, comparison tables, and related resources.
Found this useful? Check out more developer guides and tool comparisons on AI Study Room.
Top comments (0)