I shipped a small agent last quarter that reads service design documents and returns a structured production readiness review. It saves our team about an hour per review and catches gaps we usually miss during manual checks. Here is exactly how I built it on Oxlo.ai.
What you'll need
- Python 3.10 or newer
- An Oxlo.ai API key from https://portal.oxlo.ai
- The OpenAI SDK:
pip install openai - Pydantic:
pip install pydantic
Step 1: Define the system prompt
I keep the system prompt in a dedicated string so I can version it easily. The instructions are strict about output format and evaluation criteria to prevent hallucinated checks.
SYSTEM_PROMPT = """You are a senior staff engineer performing a production readiness review. The user will paste a design document or architecture description. Evaluate it across exactly five categories: observability, error handling, security, scalability, and deployment.
Return strictly JSON. Do not include markdown fences or commentary outside the JSON. The JSON schema must be:
{
"summary": "One paragraph overall assessment.",
"score": <integer 1-10>,
"findings": [
{
"category": "one of the five categories",
"status": "PASS" or "FAIL",
"recommendation": "One concrete, actionable sentence."
}
]
}
A category passes only if the design explicitly describes a reasonable implementation for that concern. Be strict. If a category is not mentioned, mark it FAIL and explain what is missing."""
Step 2: Configure the Oxlo.ai client
I use the OpenAI SDK with Oxlo.ai's base URL. I chose deepseek-v3.2 because it handles structured reasoning well and is available on Oxlo.ai's free tier, so you can test without spending credits.
from openai import OpenAI
import json
import re
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
def _strip_fences(text: str) -> str:
text = text.strip()
if text.startswith("
```"):
text = re.sub(r"^```
(?:json)?\s*", "", text)
text = re.sub(r"\s*
```
$", "", text)
return text
def review_design(text: str):
response = client.chat.completions.create(
model="deepseek-v3.2",
messages=[
{"role": "system", "content": SYSTEM_PROMPT},
{"role": "user", "content": text},
],
)
raw = response.choices[0].message.content
return json.loads(_strip_fences(raw))
Step 3: Validate with Pydantic
Parsing raw JSON from an LLM without validation is risky. I define a Pydantic model that matches the expected schema so any malformed response fails fast with a clear error.
from pydantic import BaseModel, Field
from typing import List
class Finding(BaseModel):
category: str
status: str
recommendation: str
class ReviewReport(BaseModel):
summary: str
score: int = Field(ge=1, le=10)
findings: List[Finding]
def parse_report(raw: dict) -> ReviewReport:
return ReviewReport(**raw)
Step 4: Build the CLI
I wrap the reviewer in a small argparse CLI so engineers can point it at any markdown file. Because Oxlo.ai uses flat per-request pricing, running this on a ten page RFC costs the same as a one line prompt. See https://oxlo.ai/pricing for details.
import argparse
def main():
parser = argparse.ArgumentParser(description="Production readiness reviewer")
parser.add_argument("file", help="Path to design doc markdown file")
args = parser.parse_args()
with open(args.file, "r", encoding="utf-8") as f:
design_text = f.read()
raw = review_design(design_text)
report = parse_report(raw)
print(f"Score: {report.score}/10")
print(f"Summary: {report.summary}\n")
for item in report.findings:
icon = "PASS" if item.status.upper() == "PASS" else "FAIL"
print(f"[{icon}] {item.category}: {item.recommendation}")
if __name__ == "__main__":
main()
Run it
Create a sample design document and run the script. The model should return a strict JSON review that the CLI renders into a readable report.
$ cat > design.md <<'EOF'
## Auth Service v2
A new Python FastAPI service handling user authentication against PostgreSQL. Logs print to stdout. Deployed via Docker on a single EC2 instance. No retry logic for DB connections. No monitoring or alerting configured.
EOF
$ python review.py design.md
Example output:
Score: 4/10
Summary: The design covers basic functionality but lacks critical production concerns including observability, resilience, and horizontal scalability.
[FAIL] observability: Add structured logging and export metrics to a monitoring system such as Prometheus.
[FAIL] error handling: Implement retries with exponential backoff for database connections.
[FAIL] security: Document secrets management and TLS termination strategy.
[FAIL] scalability: Stateless design is acceptable, but add load balancing and auto-scaling before production.
[PASS] deployment: Containerization with Docker is a solid start.
Next steps
Wire the script into a CI pipeline so every RFC pull request gets an automatic review comment. You can also add a second Oxlo.ai call using the vision-enabled kimi-k2.6 model to review architecture diagrams when they are attached to the design doc.
Top comments (0)