LLM for Engineering: Best Practices and Applications

#engineering #oxlo #ai

I shipped a small agent last quarter that reads service design documents and returns a structured production readiness review. It saves our team about an hour per review and catches gaps we usually miss during manual checks. Here is exactly how I built it on Oxlo.ai.

What you'll need

Python 3.10 or newer
An Oxlo.ai API key from https://portal.oxlo.ai
The OpenAI SDK: pip install openai
Pydantic: pip install pydantic

Step 1: Define the system prompt

I keep the system prompt in a dedicated string so I can version it easily. The instructions are strict about output format and evaluation criteria to prevent hallucinated checks.

SYSTEM_PROMPT = """You are a senior staff engineer performing a production readiness review. The user will paste a design document or architecture description. Evaluate it across exactly five categories: observability, error handling, security, scalability, and deployment.

Return strictly JSON. Do not include markdown fences or commentary outside the JSON. The JSON schema must be:
{
  "summary": "One paragraph overall assessment.",
  "score": <integer 1-10>,
  "findings": [
    {
      "category": "one of the five categories",
      "status": "PASS" or "FAIL",
      "recommendation": "One concrete, actionable sentence."
    }
  ]
}

A category passes only if the design explicitly describes a reasonable implementation for that concern. Be strict. If a category is not mentioned, mark it FAIL and explain what is missing."""

Step 2: Configure the Oxlo.ai client

I use the OpenAI SDK with Oxlo.ai's base URL. I chose deepseek-v3.2 because it handles structured reasoning well and is available on Oxlo.ai's free tier, so you can test without spending credits.

from openai import OpenAI
import json
import re

client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")

def _strip_fences(text: str) -> str:
    text = text.strip()
    if text.startswith("

```"):
        text = re.sub(r"^```

(?:json)?\s*", "", text)
        text = re.sub(r"\s*

```

$", "", text)
    return text

def review_design(text: str):
    response = client.chat.completions.create(
        model="deepseek-v3.2",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": text},
        ],
    )
    raw = response.choices[0].message.content
    return json.loads(_strip_fences(raw))

Step 3: Validate with Pydantic

Parsing raw JSON from an LLM without validation is risky. I define a Pydantic model that matches the expected schema so any malformed response fails fast with a clear error.

from pydantic import BaseModel, Field
from typing import List

class Finding(BaseModel):
    category: str
    status: str
    recommendation: str

class ReviewReport(BaseModel):
    summary: str
    score: int = Field(ge=1, le=10)
    findings: List[Finding]

def parse_report(raw: dict) -> ReviewReport:
    return ReviewReport(**raw)

Step 4: Build the CLI

I wrap the reviewer in a small argparse CLI so engineers can point it at any markdown file. Because Oxlo.ai uses flat per-request pricing, running this on a ten page RFC costs the same as a one line prompt. See https://oxlo.ai/pricing for details.

import argparse

def main():
    parser = argparse.ArgumentParser(description="Production readiness reviewer")
    parser.add_argument("file", help="Path to design doc markdown file")
    args = parser.parse_args()

    with open(args.file, "r", encoding="utf-8") as f:
        design_text = f.read()

    raw = review_design(design_text)
    report = parse_report(raw)

    print(f"Score: {report.score}/10")
    print(f"Summary: {report.summary}\n")
    for item in report.findings:
        icon = "PASS" if item.status.upper() == "PASS" else "FAIL"
        print(f"[{icon}] {item.category}: {item.recommendation}")

if __name__ == "__main__":
    main()

Run it

Create a sample design document and run the script. The model should return a strict JSON review that the CLI renders into a readable report.

$ cat > design.md <<'EOF'
## Auth Service v2

A new Python FastAPI service handling user authentication against PostgreSQL. Logs print to stdout. Deployed via Docker on a single EC2 instance. No retry logic for DB connections. No monitoring or alerting configured.
EOF

$ python review.py design.md

Example output:

Score: 4/10
Summary: The design covers basic functionality but lacks critical production concerns including observability, resilience, and horizontal scalability.

[FAIL] observability: Add structured logging and export metrics to a monitoring system such as Prometheus.
[FAIL] error handling: Implement retries with exponential backoff for database connections.
[FAIL] security: Document secrets management and TLS termination strategy.
[FAIL] scalability: Stateless design is acceptable, but add load balancing and auto-scaling before production.
[PASS] deployment: Containerization with Docker is a solid start.

Next steps

Wire the script into a CI pipeline so every RFC pull request gets an automatic review comment. You can also add a second Oxlo.ai call using the vision-enabled kimi-k2.6 model to review architecture diagrams when they are attached to the design doc.