I Used OpenAI + FastAPI to Build a Pet Symptom Checker — Here's the Full Stack Breakdown
Pet owners panic. It's 11 PM, your dog just ate something suspicious, and Google gives you 47 tabs of conflicting advice ranging from "it's fine" to "call 911."
So I built an AI that actually helps. Here's the complete technical breakdown — including the parts that didn't work first.
The Problem
Most pet health tools are either:
- Glorified keyword matching ("vomiting" → "go to vet")
- $100+/month subscription vet chat services
- Reddit threads from 2014
I wanted something that actually reasons about symptoms, considers species/breed/age, and gives actionable triage advice in plain English.
The Stack
Frontend: Next.js 14 + Tailwind
Backend: FastAPI (Python 3.11)
AI: OpenAI GPT-4o with structured outputs
DB: Supabase (Postgres + pgvector)
Infra: Railway + Vercel
The Core: Structured AI Outputs
The biggest mistake most devs make with AI health tools? Trusting unstructured text output.
Here's the pattern that actually works:
from openai import OpenAI
from pydantic import BaseModel
from enum import Enum
client = OpenAI()
class UrgencyLevel(str, Enum):
MONITOR = "monitor" # Watch at home
VET_SOON = "vet_soon" # Within 24-48h
VET_NOW = "vet_now" # Today, urgent
EMERGENCY = "emergency" # Go NOW
class PetTriageResult(BaseModel):
urgency: UrgencyLevel
reasoning: str
possible_causes: list[str]
home_actions: list[str]
red_flags_to_watch: list[str]
disclaimer: str
def analyze_pet_symptoms(
species: str,
breed: str,
age_years: float,
weight_kg: float,
symptoms: list[str],
duration_hours: int
) -> PetTriageResult:
system_prompt = """You are a veterinary triage assistant.
Analyze symptoms conservatively — when in doubt, escalate urgency.
Never diagnose, always triage. Focus on actionable next steps."""
user_prompt = f"""
Patient: {breed} {species}, {age_years} years old, {weight_kg}kg
Symptoms: {', '.join(symptoms)}
Duration: {duration_hours} hours
Provide a structured triage assessment.
"""
response = client.beta.chat.completions.parse(
model="gpt-4o-2024-08-06",
messages=[
{"role": "system", "content": system_prompt},
{"role": "user", "content": user_prompt}
],
response_format=PetTriageResult,
temperature=0.2 # Low temp for medical context
)
return response.choices[0].message.parsed
The beta.chat.completions.parse() endpoint with Pydantic models gives you guaranteed JSON structure — no more json.loads() and praying.
The FastAPI Endpoint
from fastapi import FastAPI, HTTPException
from fastapi.middleware.cors import CORSMiddleware
from pydantic import BaseModel
import asyncio
app = FastAPI()
app.add_middleware(
CORSMiddleware,
allow_origins=["https://mypettherapist.com"],
allow_methods=["POST"],
allow_headers=["*"]
)
class SymptomRequest(BaseModel):
species: str
breed: str
age_years: float
weight_kg: float
symptoms: list[str]
duration_hours: int
@app.post("/api/triage")
async def triage_symptoms(request: SymptomRequest):
try:
# Run in thread pool to avoid blocking the event loop
result = await asyncio.to_thread(
analyze_pet_symptoms,
request.species,
request.breed,
request.age_years,
request.weight_kg,
request.symptoms,
request.duration_hours
)
return result.model_dump()
except Exception as e:
raise HTTPException(status_code=500, detail=str(e))
The Semantic Memory Layer (The Secret Sauce)
Here's where it gets interesting. I store anonymized past triage sessions in pgvector and use them for few-shot context:
from supabase import create_client
import numpy as np
def get_similar_cases(symptom_embedding: list[float], limit: int = 3) -> list[dict]:
"""Fetch similar historical cases for few-shot context."""
supabase = create_client(SUPABASE_URL, SUPABASE_KEY)
result = supabase.rpc(
'match_pet_cases',
{
'query_embedding': symptom_embedding,
'match_threshold': 0.78,
'match_count': limit
}
).execute()
return result.data
def build_context_prompt(similar_cases: list[dict]) -> str:
if not similar_cases:
return ""
examples = []
for case in similar_cases:
examples.append(
f"Similar case: {case['symptoms']} → "
f"Urgency: {case['urgency']} | "
f"Outcome: {case['outcome_summary']}"
)
return "\n".join(["Reference cases:"] + examples)
This reduced hallucinations by ~40% in my testing — especially for rare breed-specific conditions.
What Failed (Honest Post-Mortem)
Attempt 1: GPT-3.5-turbo — Too inconsistent. Would sometimes classify "lethargic after eating" as EMERGENCY. Switched to GPT-4o, problem mostly solved.
Attempt 2: Streaming responses — Users loved seeing words appear in real-time... until the partial JSON broke everything. Switched to full response + loading spinner. Boring but reliable.
Attempt 3: Multi-species single prompt — Turns out cat and dog physiology is different enough that one prompt performed poorly. Separate system prompts per species, ~30% accuracy improvement.
Attempt 4: Asking users to rate urgency — Confirmation bias is real. People rate "my dog seems tired" as MONITOR even when it's 5 days of lethargy. Removed user urgency input entirely.
Performance Numbers
- Average response time: 1.8s (GPT-4o) / 0.9s (GPT-4o-mini fallback)
- P99 latency: 4.2s
- Structured output parse failures: < 0.1%
- Cost per triage: ~$0.008
Rate Limiting (Don't Skip This)
from slowapi import Limiter, _rate_limit_exceeded_handler
from slowapi.util import get_remote_address
from fastapi import Request
limiter = Limiter(key_func=get_remote_address)
app.state.limiter = limiter
app.add_exception_handler(RateLimitExceeded, _rate_limit_exceeded_handler)
@app.post("/api/triage")
@limiter.limit("10/minute")
async def triage_symptoms(request: Request, body: SymptomRequest):
# ... same as before
Without this, a single viral Reddit post will drain your OpenAI credits in 20 minutes. Ask me how I know.
The Actual App
If you want to see this in action (not just the code), the live version is at mypettherapist.com — free to use, no signup required. Works for dogs, cats, and a surprisingly wide range of exotic pets.
TL;DR
-
Use structured outputs — Pydantic +
beta.chat.completions.parse()is non-negotiable for production - Separate system prompts per species — don't fight physiology with clever prompting
- pgvector for few-shot context — surprisingly effective for domain-specific accuracy
- Low temperature + conservative bias — medical context demands it
- Rate limit before launch — seriously
Building something similar? Drop your questions in the comments — happy to share more implementation details.
What's the weirdest symptom your pet has ever had? 👇
Top comments (0)