Esther Studer

Posted on Mar 24

I Built an AI Pet Behavior Analyzer — Here's the Two-Stage LLM Pattern That Made It Work

#ai #python #showdev #webdev

Have you ever watched your dog stare at a wall for 20 minutes straight and thought: is something wrong, or is this just... vibes?

I have. And that question led me down a rabbit hole of AI, behavioral science, and a lot of very opinionated veterinarians.

Here's what I built — and more importantly, what it taught me about building AI apps that actually understand emotional nuance.

The Problem: Pet Behavior Is Deeply Contextual

Most AI apps handle clean, structured inputs: "Give me a recipe." "Summarize this PDF." Easy.

Pet behavior is the opposite. It's:

Non-verbal
Highly contextual (a cat hiding can mean playful, scared, sick, or Tuesday)
Owner-biased (we anthropomorphize everything)

Building MyPetTherapist forced me to solve a real LLM challenge: how do you extract signal from emotionally loaded, ambiguous human descriptions?

The Architecture (Simplified)

from openai import OpenAI
import json

client = OpenAI()

SYSTEM_PROMPT = """
You are a veterinary behavioral analyst.
Your job is to:
1. Identify described behaviors with clinical precision
2. Separate owner emotion from objective observation
3. Return structured JSON with confidence scores
4. Flag urgency level: routine / monitor / vet_now

Never diagnose. Always recommend professional follow-up.
"""

def analyze_pet_behavior(owner_description: str, pet_profile: dict) -> dict:
    context = f"""
    Pet: {pet_profile['name']}, {pet_profile['species']}, {pet_profile['age_years']} years old.
    Owner says: {owner_description}
    """

    response = client.chat.completions.create(
        model="gpt-4o",
        response_format={"type": "json_object"},
        messages=[
            {"role": "system", "content": SYSTEM_PROMPT},
            {"role": "user", "content": context}
        ],
        temperature=0.3  # Low temp = more consistent clinical output
    )

    return json.loads(response.choices[0].message.content)

# Example usage
result = analyze_pet_behavior(
    owner_description="He's been hiding under the bed for 3 days and won't eat his favorite treats",
    pet_profile={"name": "Mochi", "species": "cat", "age_years": 4}
)

print(result)
# {
#   "behaviors_identified": ["hiding", "anorexia", "treat_refusal"],
#   "clinical_concern_level": "moderate",
#   "urgency": "monitor",
#   "possible_causes": ["stress", "illness", "environmental_change"],
#   "confidence": 0.78,
#   "recommendation": "Monitor 24h, vet visit if no improvement"
# }

Clean, structured, actionable.

The Hard Part: Emotional Denoising

Here's the real challenge. Owners say things like:

"He seems sad lately and I think he hates me because I went on vacation"

That sentence contains:

One observable behavior (probably lethargy)
One emotional projection ("hates me")
One causal assumption (the vacation)
Zero clinical data

My first attempts at parsing this were... a mess. The model would either:

Validate the emotional narrative (bad)
Completely ignore behavioral signals (also bad)

The fix: a two-stage pipeline.

def two_stage_analysis(raw_input: str, pet_profile: dict) -> dict:

    # Stage 1: Separate fact from emotion
    separation_prompt = """
    Given this owner statement, extract:
    - OBSERVABLE_BEHAVIORS: only things that can be seen/measured
    - OWNER_EMOTIONS: what the owner feels
    - ASSUMPTIONS: causal claims owner is making

    Return as JSON. Be ruthlessly objective.
    """

    stage1 = client.chat.completions.create(
        model="gpt-4o-mini",  # Cheaper model for filtering
        response_format={"type": "json_object"},
        messages=[
            {"role": "system", "content": separation_prompt},
            {"role": "user", "content": raw_input}
        ]
    )

    filtered = json.loads(stage1.choices[0].message.content)

    # Stage 2: Analyze only the observable behaviors
    clean_input = " ".join(filtered.get("OBSERVABLE_BEHAVIORS", []))

    return analyze_pet_behavior(clean_input, pet_profile)

Cost optimization bonus: Stage 1 runs on gpt-4o-mini. Only clean, structured data hits the expensive model. Costs dropped by ~60%.

Lessons for Any AI Behavioral App

1. Temperature is your clinical dial

temperature=0.0 → Deterministic, rigid. Good for medical checklists.
temperature=0.7 → Creative, varied. Good for empathetic response generation.
temperature=0.3 → The sweet spot for "expert but not robot."

2. Structured output > free text (always)

Use response_format={"type": "json_object"} whenever you need to act on the output. Free text is great for humans. JSON is great for pipelines.

3. Confidence scores matter

Don't just give an answer. Make the model express uncertainty:

# Add to your system prompt:
"Always include a confidence field (0.0-1.0). 
If confidence < 0.6, set urgency to 'consult_professional'."

4. Validate the edge cases first

My weirdest test cases:

"My fish seems depressed" (no behavioral baselines → model must say so)
"My dog ate a sock 3 weeks ago and was fine" (historical, non-urgent → must not alarm)
"HELP MY CAT IS DYING" (high urgency signal in formatting itself)

All three break naive implementations. Test for panic, ambiguity, and historical inputs.

What's Next

The v2 architecture includes:

Multi-turn conversation to gather more behavioral data over time
Embedding-based symptom clustering across thousands of cases
Integration with vet appointment booking (the real moat)

If you're building something similar — health, behavior, or any emotionally loaded domain — the two-stage denoising pattern is the most valuable thing I've found.

Want to see this in action? MyPetTherapist.com is live — try describing your pet's weirdest behavior and see what the AI makes of it.

Built something cool with LLMs? Drop it in the comments — always looking for architecture inspiration.

DEV Community