DEV Community

shashank ms
shashank ms

Posted on

A Practical Guide to Using LLMs for Engineering

We are going to build a design review agent that reads an engineering spec and flags missing context, security gaps, and operational concerns before implementation starts. It runs from a single Python file and returns a structured JSON report you can paste into a PR or ticket. Teams that review specs manually spend hours on consistency checks that a model can handle in seconds, and Oxlo.ai's flat per-request pricing (see https://oxlo.ai/pricing) means long documents do not inflate your bill.

What you'll need

  • Python 3.10 or newer
  • An Oxlo.ai API key from https://portal.oxlo.ai
  • The OpenAI SDK: pip install openai
  • A sample markdown spec to test against

Step 1: Scaffold the client and verify the connection

Start by instantiating the client against Oxlo.ai. I keep the API key in an environment variable. I use Llama 3.3 70B because it handles long context windows reliably, and since Oxlo.ai charges per request rather than per token, feeding it a 3,000-word spec costs the same as a one-liner.

from openai import OpenAI
import os

client = OpenAI(
    base_url="https://api.oxlo.ai/v1",
    api_key=os.environ["OXLO_API_KEY"]
)

# Quick smoke test
response = client.chat.completions.create(
    model="llama-3.3-70b",
    messages=[{"role": "user", "content": "Reply with exactly: Connection OK"}],
)

assert "Connection OK" in response.choices[0].message.content
print("Oxlo.ai client ready")

Step 2: Define the review rubric

The agent needs a rigid rubric or it will hallucinate criteria. I store the prompt in a module-level constant so it can be versioned in git. The rubric checks for goals, non-goals, dependencies, security, observability, and rollback plans.

SYSTEM_PROMPT = """You are a staff engineer performing a design review.
Read the provided engineering spec and evaluate it against the rubric below.
Return ONLY a JSON object. Do not wrap it in markdown.

Rubric categories (score 0-10):
1. goals: Are the goals and success criteria explicit?
2. non_goals: Are boundaries and out-of-scope items listed?
3. dependencies: Are upstream and downstream dependencies identified?
4. security: Are threat model, authz, and data handling addressed?
5. observability: Are metrics, logs, and alerts defined?
6. rollback: Is there a rollback or kill-switch plan?
7. capacity: Are load estimates and scaling assumptions stated?

For each category scoring below 7, emit a finding with a specific recommendation.
Also provide an overall summary and priority-ordered next steps.

JSON schema:
{
  "scores": {"goals": 8, "non_goals": 5, ...},
  "findings": [
    {"category": "non_goals", "score": 5, "issue": "...", "recommendation": "..."}
  ],
  "summary": "...",
  "next_steps": ["...", "..."]
}
"""

Step 3: Build the review function with JSON mode

Now I wrap the API call in a function that accepts a spec string and returns parsed JSON. Oxlo.ai supports JSON mode on all chat models, so I set response_format to enforce valid output.

import json

def review_spec(spec_text: str) -> dict:
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": f"Engineering spec:\n\n{spec_text}"},
        ],
        response_format={"type": "json_object"},
    )
    raw = response.choices[0].message.content
    return json.loads(raw)

Step 4: Add severity classification

A raw score is useful, but I want anything involving auth or data integrity escalated. I add a second pass that tags each finding with LOW, MEDIUM, HIGH, or CRITICAL. Because this is a separate API call, Oxlo.ai's flat per-request pricing keeps the cost predictable even when you chain multiple reasoning steps.

SEVERITY_PROMPT = """You are an SRE triaging engineering findings.
Given the JSON findings below, assign each a severity.
CRITICAL means auth, PII, data loss, or missing rollback without a guardrail.
Return valid JSON with the same list structure but add a 'severity' field to each item."""

def classify_severity(findings: list) -> list:
    response = client.chat.completions.create(
        model="llama-3.3-70b",
        messages=[
            {"role": "system", "content": SEVERITY_PROMPT},
            {"role": "user", "content": json.dumps(findings, indent=2)},
        ],
        response_format={"type": "json_object"},
    )
    data = json.loads(response.choices[0].message.content)
    return data.get("findings", findings)

Step 5: Wire up the CLI

Finally, I add a small CLI that reads a markdown file, runs the review, classifies severity, and prints a formatted report.

import sys

def main(file_path: str):
    with open(file_path, "r", encoding="utf-8") as f:
        spec = f.read()

    result = review_spec(spec)
    result["findings"] = classify_severity(result["findings"])

    print("=" * 60)
    print("DESIGN REVIEW REPORT")
    print("=" * 60)
    print(f"Overall summary: {result['summary']}\n")

    for cat, score in result["scores"].items():
        print(f"{cat:15s}: {score}/10")
    print()

    for f in result["findings"]:
        print(f"[{f['severity']:8s}] {f['category']}: {f['issue']}")
        print(f"           Recommendation: {f['recommendation']}\n")

    print("Next steps:")
    for step in result["next_steps"]:
        print(f"  - {step}")

if __name__ == "__main__":
    if len(sys.argv) < 2:
        print("Usage: python reviewer.py design.md")
        sys.exit(1)
    main(sys.argv[1])

Run it

Create a file named design.md with a thin spec, then run the script.

# Service X Migration

## Goals
Move user data from Postgres to DynamoDB.

## Non-Goals
None.

## Dependencies
Internal auth service.

## Rollback
TBD.

Run the agent:

$ export OXLO_API_KEY=your-api-key
$ python reviewer.py design.md

Example output:

============================================================
DESIGN REVIEW REPORT
============================================================
Overall summary: The spec identifies a migration goal but lacks non-goals, capacity planning, observability, and a concrete rollback strategy.

goals           : 8/10
non_goals       : 2/10
dependencies    : 6/10
security        : 4/10
observability   : 1/10
rollback        : 2/10
capacity        : 0/10

[CRITICAL  ] rollback: No rollback plan defined for a data migration.
           Recommendation: Define a backward-compatible write path and a revert script with data validation before cutover.

[HIGH      ] observability: No metrics, alarms, or log aggregation strategy listed.
           Recommendation: Add row-count reconciliation metrics and p99 latency alarms for both stores during dual-write.

[MEDIUM    ] non_goals: Section is empty, allowing scope creep.
           Recommendation: Explicitly list out-of-scope features, such as historical data backfill and real-time sync.

Next steps:
  - Draft a detailed rollback runbook with go/no-go criteria.
  - Add observability requirements including dashboards and paging thresholds.
  - Fill capacity estimates for peak write throughput.

Wrap-up

This agent replaces the first-pass consistency check I used to do manually. If you want to extend it, wire it into a GitHub Action so every PR containing a design doc gets an automatic review comment. You could also swap in DeepSeek V3.2 or Qwen 3 32B on Oxlo.ai for heavier reasoning workloads without changing any code, because the client is fully OpenAI-compatible.

Top comments (0)