DEV Community

Esther Studer
Esther Studer

Posted on

I Built an AI That Analyzes Your Pet's Behavior — Here's the Architecture

Have you ever stared at your dog doing something weird and thought: "Is this normal, or should I be worried?"

That question sent me down a rabbit hole that ended with a multimodal AI pipeline capable of analyzing pet behavior from photos, videos, and plain-text descriptions. Here's how it actually works — warts and all.


The Problem With Pet Health

Vets are expensive. Google is terrifying ("dog sneezes → lung cancer"). And most pet owners have zero framework for distinguishing quirky behavior from a genuine red flag.

What if you could describe what your pet is doing and get a structured, expert-informed analysis in seconds?

# What we wanted the API to feel like
response = analyze_pet_behavior(
    description="My cat keeps sitting in front of the water bowl but not drinking.",
    pet_type="cat",
    age_years=3,
    photo_url="https://example.com/cat.jpg"  # optional
)

print(response.assessment)  
# → "This behavior may indicate dental pain, nausea, or early kidney issues.
#    Hydration monitoring recommended. Vet visit if persists >48h."
Enter fullscreen mode Exit fullscreen mode

Simple interface. Complex backend. Let's dig in.


Architecture Overview

┌─────────────────────────────────────────────────────┐
│                   User Input Layer                  │
│   Text Description | Photo Upload | Video Clip      │
└────────────────┬────────────────────────────────────┘
                 │
┌────────────────▼────────────────────────────────────┐
│              Preprocessing Pipeline                 │
│  • Text: intent extraction, symptom normalization   │
│  • Image: CLIP embeddings + object detection        │
│  • Video: frame sampling + motion vectors           │
└────────────────┬────────────────────────────────────┘
                 │
┌────────────────▼────────────────────────────────────┐
│           Multimodal LLM Analysis Core              │
│  GPT-4o / Claude for reasoning                      │
│  RAG over veterinary knowledge base                 │
│  Structured output (Pydantic schemas)               │
└────────────────┬────────────────────────────────────┘
                 │
┌────────────────▼────────────────────────────────────┐
│              Response Composer                      │
│  Severity scoring | Actionable steps | Disclaimers  │
└─────────────────────────────────────────────────────┘
Enter fullscreen mode Exit fullscreen mode

Step 1: Symptom Normalization

Users describe the same thing a hundred different ways:

  • "my dog is shaking"
  • "trembling golden retriever"
  • "keeps shivering even though it's warm"

We normalize these into structured symptom vectors before sending them to the LLM. This dramatically improves RAG retrieval accuracy.

from pydantic import BaseModel
from typing import List, Optional

class NormalizedSymptom(BaseModel):
    canonical_term: str          # e.g. "involuntary_tremor"
    body_region: Optional[str]   # e.g. "full_body", "hindquarters"
    duration_hint: Optional[str] # e.g. "intermittent", "persistent"
    severity_hint: Optional[str] # e.g. "mild", "severe"

class SymptomExtractionResult(BaseModel):
    symptoms: List[NormalizedSymptom]
    pet_type: str
    behavior_context: str
    urgency_flag: bool

# Extraction prompt (simplified)
EXTRACTION_PROMPT = """
You are a veterinary triage assistant. Extract structured symptoms 
from this pet owner's description. Be conservative with urgency_flag — 
only true if symptoms suggest immediate danger.

Description: {description}
Pet type: {pet_type}

Return JSON matching the SymptomExtractionResult schema.
"""
Enter fullscreen mode Exit fullscreen mode

Step 2: RAG Over Veterinary Knowledge

This is where the magic (and the most debugging) happens.

We built a knowledge base from:

  • Open veterinary textbooks (PD)
  • AVMA guidelines
  • Synthesized Q&A pairs reviewed by actual vets

Embeddings: text-embedding-3-small (fast, cheap, good enough)
Vector store: Qdrant (self-hosted, ~2GB for full corpus)

import qdrant_client
from openai import OpenAI

client = OpenAI()
qdrant = qdrant_client.QdrantClient("localhost", port=6333)

def retrieve_veterinary_context(
    symptoms: List[NormalizedSymptom],
    pet_type: str,
    top_k: int = 5
) -> List[str]:
    # Build a rich query from normalized symptoms
    query_text = f"{pet_type}: " + ", ".join(
        f"{s.canonical_term} ({s.body_region or 'general'})"
        for s in symptoms
    )

    # Embed the query
    embedding = client.embeddings.create(
        input=query_text,
        model="text-embedding-3-small"
    ).data[0].embedding

    # Search
    results = qdrant.search(
        collection_name="veterinary_kb",
        query_vector=embedding,
        query_filter={
            "must": [{"key": "pet_type", "match": {"value": pet_type}}]
        },
        limit=top_k
    )

    return [r.payload["text"] for r in results]
Enter fullscreen mode Exit fullscreen mode

Key lesson learned: filter by pet_type early. Without it, a "cat vomiting" query retrieves "cow bloat" results and your LLM confidently tells someone their tabby might have hardware disease. 🐄


Step 3: The Vision Pipeline

For photo analysis, we use GPT-4o's vision capability with a structured prompt:

def analyze_pet_photo(image_url: str, pet_type: str) -> dict:
    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[
            {
                "role": "user",
                "content": [
                    {
                        "type": "text",
                        "text": f"""Analyze this photo of a {pet_type}. 
                        Focus ONLY on observable physical indicators:
                        - Body posture and positioning
                        - Visible skin/coat/eye abnormalities  
                        - Signs of distress or discomfort
                        - Environmental context clues

                        Do NOT diagnose. Return structured observations only."""
                    },
                    {
                        "type": "image_url",
                        "image_url": {"url": image_url, "detail": "high"}
                    }
                ]
            }
        ],
        response_format={"type": "json_object"}
    )
    return response.choices[0].message.content
Enter fullscreen mode Exit fullscreen mode

The "Do NOT diagnose" instruction is critical — without it, the model tries to be a vet, which creates liability nightmares and, more importantly, is just wrong.


Step 4: Severity Scoring

Every analysis produces a severity score (1–5) with explicit escalation thresholds:

SEVERITY_THRESHOLDS = {
    1: "Monitor at home — normal variation",
    2: "Monitor closely — schedule routine vet visit if persists",
    3: "Vet visit recommended within 48–72 hours",
    4: "Vet visit today",
    5: "EMERGENCY — seek veterinary care immediately"
}

class BehaviorAnalysis(BaseModel):
    summary: str
    possible_causes: List[str]
    severity_score: int  # 1–5
    severity_label: str
    recommended_actions: List[str]
    watch_for: List[str]  # symptoms that would escalate severity
    disclaimer: str = "This analysis is informational only and does not constitute veterinary advice."
Enter fullscreen mode Exit fullscreen mode

What We Got Wrong (and Fixed)

❌ First attempt: Single massive prompt with all context → LLM got confused, mixed symptoms, hallucinated breed-specific conditions.

✅ Fix: Decompose into a pipeline. Extract first. Retrieve second. Reason third. Compose last. Each step is auditable.

❌ Second mistake: No output validation. The LLM occasionally returned severity_score: 6 or forgot the disclaimer.

✅ Fix: Pydantic everywhere. If the model returns invalid output, retry with an error correction prompt (max 2 retries, then fail gracefully).

❌ Third mistake: Treating all pet types the same. Dogs and cats have completely different baseline behaviors.

✅ Fix: Species-specific system prompts and separate RAG collections per species.


Performance Stats (Production)

Metric Value
Avg. response time 2.1s (text only)
Avg. response time 4.8s (with image)
Severity accuracy* ~84%
False urgency rate <6%
Daily analyses ~400–600

*Validated against vet assessments on a 200-case test set.


Try It

If you have a pet and want to put this to work (or just test it on increasingly weird hypothetical scenarios), it's live at mypettherapist.com — free to try, no account needed for basic analyses.


What's Next

  • Fine-tuned classifier for urgency detection (currently pure prompting)
  • Video analysis — motion pattern recognition for gait analysis
  • Multi-pet households — tracking behavioral changes over time

Building in public. Happy to answer questions about any part of the stack.


What's the weirdest pet behavior you've ever Googled at 2am? Drop it in the comments — I'll run it through the system and share the analysis. 🐾

Top comments (0)