We are going to build a system design review agent that reads an engineering design document and returns a structured critique covering scalability, failure modes, and tradeoffs. It helps staff engineers and tech leads enforce consistency during design reviews without becoming a bottleneck. Because Oxlo.ai uses flat per-request pricing, you can feed it long documents without watching the cost scale with every extra paragraph.
What you'll need
Before starting, grab an Oxlo.ai API key from https://portal.oxlo.ai. You will also need Python 3.10 or newer and the OpenAI SDK.
pip install openai
Step 1: Verify connectivity with Oxlo.ai
I always start with a smoke test. This snippet confirms the client can authenticate and reach the API.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
response = client.chat.completions.create(
model="llama-3.3-70b",
messages=[
{"role": "system", "content": "You are a helpful assistant."},
{"role": "user", "content": "Reply with OK"},
],
)
print(response.choices[0].message.content)
Step 2: Define the system prompts
I split the agent into two phases: asking clarifying questions and delivering the final review. Keeping the prompts in separate variables makes iteration easier.
QUESTIONS_PROMPT = """You are a staff engineer reviewing a system design doc.
Read the proposal and ask up to three clarifying questions that would help you evaluate scalability, reliability, security, and operational complexity.
Output only the questions, one per line, prefixed with a dash."""
REVIEW_PROMPT = """You are a staff engineer performing a final design review.
Evaluate the proposed system on these axes:
- Scalability
- Reliability and failure modes
- Security
- Operational complexity
- Cost efficiency
Return your analysis as a JSON object with these keys:
- summary: a one sentence verdict
- scores: an object with integer scores from 1 to 5 for each axis
- strengths: list of strings
- risks: list of strings
- actionable: list of concrete next steps
Be critical but constructive."""
Step 3: Generate clarifying questions
The first pass sends the design doc to Kimi K2.6, which handles long context and reasoning well. We print the questions so the user can answer them before we produce the final review.
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
design_doc = """# URL Shortener v2
We will use a single PostgreSQL primary with read replicas.
Application: Python Flask.
Caching: Redis, to be added later.
Traffic: 1M redirects/day.
Uptime target: 99.9%.
"""
response = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{"role": "system", "content": QUESTIONS_PROMPT},
{"role": "user", "content": design_doc},
],
)
questions = response.choices[0].message.content
print(questions)
Step 4: Produce the structured review
After collecting answers, the second pass uses DeepSeek V3.2 with JSON mode to force a machine-readable report. This makes it trivial to store results in a database or render them in a dashboard.
import json
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
answers = """Expected peak write throughput is 100 short URLs per second.
Redis will use TTL-based expiration with a fallback to the database.
We will run in a single AWS region for now."""
response = client.chat.completions.create(
model="deepseek-v3.2",
messages=[
{"role": "system", "content": REVIEW_PROMPT},
{"role": "user", "content": f"Design doc:\n{design_doc}\n\nClarifying answers:\n{answers}"},
],
response_format={"type": "json_object"},
)
review = json.loads(response.choices[0].message.content)
print(json.dumps(review, indent=2))
Step 5: Wrap it in a CLI
Finally, I combine both passes into a single script that reads a file, prints questions, waits for stdin, then prints the final JSON review.
import argparse
import json
import sys
from openai import OpenAI
client = OpenAI(base_url="https://api.oxlo.ai/v1", api_key="YOUR_OXLO_API_KEY")
QUESTIONS_PROMPT = """You are a staff engineer reviewing a system design doc.
Read the proposal and ask up to three clarifying questions that would help you evaluate scalability, reliability, security, and operational complexity.
Output only the questions, one per line, prefixed with a dash."""
REVIEW_PROMPT = """You are a staff engineer performing a final design review.
Evaluate the proposed system on these axes:
- Scalability
- Reliability and failure modes
- Security
- Operational complexity
- Cost efficiency
Return your analysis as a JSON object with these keys:
- summary: a one sentence verdict
- scores: an object with integer scores from 1 to 5 for each axis
- strengths: list of strings
- risks: list of strings
- actionable: list of concrete next steps
Be critical but constructive."""
def run_review(design_doc_path: str):
with open(design_doc_path, "r") as f:
design_doc = f.read()
q_response = client.chat.completions.create(
model="kimi-k2.6",
messages=[
{"role": "system", "content": QUESTIONS_PROMPT},
{"role": "user", "content": design_doc},
],
)
questions = q_response.choices[0].message.content
print("=== CLARIFYING QUESTIONS ===")
print(questions)
print("\nPaste your answers followed by Ctrl+D (Unix) or Ctrl+Z then Enter (Windows):")
answers = sys.stdin.read()
r_response = client.chat.completions.create(
model="deepseek-v3.2",
messages=[
{"role": "system", "content": REVIEW_PROMPT},
{"role": "user", "content": f"Design doc:\n{design_doc}\n\nClarifying answers:\n{answers}"},
],
response_format={"type": "json_object"},
)
review = json.loads(r_response.choices[0].message.content)
print("\n=== FINAL REVIEW ===")
print(json.dumps(review, indent=2))
if __name__ == "__main__":
parser = argparse.ArgumentParser(description="System design review agent powered by Oxlo.ai")
parser.add_argument("design_doc", help="Path to design document markdown file")
args = parser.parse_args()
run_review(args.design_doc)
Run it
Save the design doc below as design.md and invoke the agent.
python review_agent.py design.md
Example design doc:
# URL Shortener v2
We will use a single PostgreSQL primary with read replicas.
Application: Python Flask.
Caching: Redis, to be added later.
Traffic: 1M redirects/day.
Uptime target: 99.9%.
Example clarifying questions from the agent:
- What is the expected peak write throughput for new short URLs?
- How will you handle cache warming and invalidation once Redis is introduced?
- Is there a plan for multi-region failover?
After you answer, the final JSON output will look similar to this:
{
"summary": "The design is pragmatic for launch but carries single-region and single-write-node risks.",
"scores": {
"scalability": 3,
"reliability and failure modes": 3,
"security": 3,
"operational complexity": 4,
"cost efficiency": 4
},
"strengths": [
"Read replicas offload analytics queries",
"Flask simplicity keeps operational overhead low"
],
"risks": [
"PostgreSQL primary is a single point of write failure",
"No cache layer on day one invites load spikes",
"Single region deployment misses 99.9% if AZ fails"
],
"actionable": [
"Add connection pooling before launch",
"Document cache invalidation strategy in phase two",
"Run a tabletop exercise for primary DB failover"
]
}
Next steps
Wire this agent into a GitHub Action so it automatically comments on pull requests that touch docs/design/. If you want to go further, extend the agent with vision support by feeding architecture diagrams to Oxlo.ai's vision models such as Kimi VL A3B or Gemma 3 27B.
Top comments (0)