Esther Studer

Posted on Mar 28

I Built an AI That Analyzes Pet Behavior — Here's the Python Stack Behind It

#ai #python #showdev #webdev

I Built an AI That Analyzes Pet Behavior — Here's the Python Stack Behind It

My dog stares at me for 45 minutes straight every morning. Not blinking. Just... watching.

So naturally, I did what any developer would do: I built an AI to figure out what she's thinking.

What started as a weekend joke turned into MyPetTherapist — an AI-powered platform that helps pet owners understand their animals' behavior patterns. Here's the stack, the lessons, and the one architectural decision I almost got catastrophically wrong.

The Problem (Beyond My Creepy Dog)

Vet behaviorists charge $250–$500/hour. Most pet owners can't afford that — but they do have smartphones, photos, and a burning need to know why their cat knocks things off tables at 3 AM.

The goal: make behavioral insights accessible, fast, and actually useful.

The Stack

1. Vision Pipeline: GPT-4o + Custom Prompt Engineering

The core is a vision analysis pipeline. When a user uploads a photo or short video, we extract frames and run them through GPT-4o with a carefully structured prompt:

import openai
import base64
from pathlib import Path

def analyze_pet_frame(image_path: str, pet_context: dict) -> dict:
    """
    Analyze a single frame for behavioral signals.
    pet_context: {species, breed, age, known_issues}
    """
    client = openai.OpenAI()

    with open(image_path, "rb") as f:
        b64_image = base64.b64encode(f.read()).decode("utf-8")

    prompt = f"""
    You are an expert animal behaviorist analyzing a {pet_context['species']} ({pet_context['breed']}).

    Analyze this image for:
    1. Body language signals (ears, tail, posture, eyes)
    2. Emotional state indicators (stress, calm, alert, playful)
    3. Environmental triggers visible in frame
    4. Urgency level (1-5, where 5 = vet immediately)

    Context: {pet_context.get('known_issues', 'None provided')}

    Return structured JSON only. No prose.
    """

    response = client.chat.completions.create(
        model="gpt-4o",
        messages=[{
            "role": "user",
            "content": [
                {"type": "text", "text": prompt},
                {"type": "image_url", "image_url": {"url": f"data:image/jpeg;base64,{b64_image}"}}
            ]
        }],
        response_format={"type": "json_object"},
        max_tokens=500
    )

    return json.loads(response.choices[0].message.content)

The response_format: json_object enforcement was a game-changer. Early versions returned freeform text and parsing was a nightmare.

2. Multi-Frame Aggregation

Single frames lie. A dog with ears back might be sleeping, not scared. We analyze 8–12 frames per submission and aggregate:

from collections import Counter
import statistics

def aggregate_behavioral_signals(frame_analyses: list[dict]) -> dict:
    """Aggregate signals across multiple frames for reliable assessment."""

    emotional_states = [f["emotional_state"] for f in frame_analyses]
    urgency_scores = [f["urgency_level"] for f in frame_analyses]

    # Dominant state wins, but outliers matter
    state_counts = Counter(emotional_states)
    dominant_state = state_counts.most_common(1)[0][0]

    # High urgency in ANY frame = escalate
    max_urgency = max(urgency_scores)
    avg_urgency = statistics.mean(urgency_scores)

    # Collect all unique body language signals
    all_signals = []
    for frame in frame_analyses:
        all_signals.extend(frame.get("body_language_signals", []))
    unique_signals = list(set(all_signals))

    return {
        "dominant_emotional_state": dominant_state,
        "confidence": state_counts[dominant_state] / len(emotional_states),
        "urgency_level": max_urgency,  # Conservative: use max
        "avg_urgency": round(avg_urgency, 1),
        "body_language_signals": unique_signals,
        "frame_count": len(frame_analyses)
    }

The key insight: use max_urgency, not avg_urgency. We're not optimizing for user comfort — we're optimizing for pet safety. One frame showing distress is enough to flag.

3. The Caching Layer That Saved Us

This is where I almost made a catastrophic mistake.

My first instinct was to cache by image hash — if the same photo is uploaded twice, return cached results. Seemed obvious.

Problem: Two different dogs submitting the same photo hash would get each other's analysis. This never actually happened, but the possibility made me lose sleep.

The correct approach:

import hashlib
import redis

def get_cache_key(image_hash: str, pet_id: str, pet_context: dict) -> str:
    """
    Cache key MUST include pet identity — same image, different pet = different analysis.
    """
    context_hash = hashlib.md5(
        str(sorted(pet_context.items())).encode()
    ).hexdigest()[:8]

    return f"analysis:{pet_id}:{image_hash}:{context_hash}"

def get_or_analyze(image_path: str, pet_id: str, pet_context: dict) -> dict:
    r = redis.Redis()

    with open(image_path, "rb") as f:
        image_hash = hashlib.sha256(f.read()).hexdigest()[:16]

    cache_key = get_cache_key(image_hash, pet_id, pet_context)

    cached = r.get(cache_key)
    if cached:
        return json.loads(cached)

    # Analyze frames
    frames = extract_frames(image_path, count=10)
    analyses = [analyze_pet_frame(frame, pet_context) for frame in frames]
    result = aggregate_behavioral_signals(analyses)

    # Cache for 7 days — behavior context doesn't change that fast
    r.setex(cache_key, 604800, json.dumps(result))

    return result

Always include pet_id in the cache key. Always.

4. The Report Generator

Raw JSON means nothing to a worried pet owner at midnight. We convert analysis output into human-readable reports using a secondary LLM pass:

def generate_owner_report(analysis: dict, pet_name: str, pet_species: str) -> str:
    """Convert technical analysis into actionable owner guidance."""

    client = openai.OpenAI()

    prompt = f"""
    You are a compassionate pet behaviorist writing a report for a concerned owner.

    Pet: {pet_name} ({pet_species})
    Analysis data: {json.dumps(analysis, indent=2)}

    Write a warm, clear report (200-300 words) that:
    - Explains what you observed in plain language
    - Gives 2-3 specific, actionable recommendations  
    - Flags any urgency concerns clearly but without causing panic
    - Ends with one positive observation about their pet

    Tone: Professional but warm. Like a trusted friend who happens to be a vet.
    """

    response = client.chat.completions.create(
        model="gpt-4o-mini",  # Cheaper for prose generation
        messages=[{"role": "user", "content": prompt}],
        max_tokens=400
    )

    return response.choices[0].message.content

Notice we use gpt-4o-mini here, not gpt-4o. The vision analysis needs the full model's accuracy. The report writing is prose generation — mini handles it perfectly at 10% of the cost.

What I Learned

1. Veterinary AI is trust-critical, not just accuracy-critical.
Users don't just need correct answers — they need to feel confident in those answers. This shaped every UX decision.

2. Multi-frame > single-frame, always.
Our accuracy on behavioral state detection went from 67% to 89% just by analyzing 10 frames instead of 1.

3. The cache key is a security decision, not a performance decision.
Treat it that way from day one.

4. GPT-4o-mini for prose, GPT-4o for vision/reasoning.
This alone cut our inference costs by ~60%.

What's Next

We're working on longitudinal tracking — comparing a pet's behavior over weeks to spot gradual changes that owners miss. A dog that's "always a little anxious" might actually be getting worse.

If you're curious about the platform or want to try it out, check out MyPetTherapist.com.

And yes, turns out my dog stares at me because she wants breakfast. The AI was correct. Slightly embarrassing.

Built something similar? I'd love to compare notes in the comments. Especially if you've tackled multi-species support — cats are a whole different problem.

DEV Community

I Built an AI That Analyzes Pet Behavior — Here's the Python Stack Behind It

I Built an AI That Analyzes Pet Behavior — Here's the Python Stack Behind It

The Problem (Beyond My Creepy Dog)

The Stack

1. Vision Pipeline: GPT-4o + Custom Prompt Engineering

2. Multi-Frame Aggregation

3. The Caching Layer That Saved Us

4. The Report Generator

What I Learned

What's Next

Top comments (0)