Esther Studer

Posted on Mar 25

I Built an AI That Reads Your Pet's Behavior — Here's How the Stack Works

#python #ai #webdev #showdev

Your dog stares at the wall at 3 AM. Your cat knocks everything off the table — again. Your rabbit is thumping for the fifth time today. You know something is off, but you can't decode it.

I got tired of Googling "why does my dog spin in circles before lying down" and built something better: an AI that interprets pet behavior using a combination of NLP, a fine-tuned symptom classifier, and a lightweight RAG pipeline backed by veterinary literature.

Here's exactly how it works under the hood.

The Problem With Pet Behavior Data

Most pet advice online is:

Generic ("consult a vet" — thanks, Captain Obvious)
Anecdotal (Reddit posts from 2011)
Buried in 3,000-word listicles

What pet owners actually need is fast, contextual interpretation — "My 4-year-old Labrador is licking his paws obsessively after walks" should yield a precise, ranked list of possible causes, not a Wikipedia entry on dogs.

The Architecture

1. Input Layer — Free-text behavior description

Users describe their pet's behavior in plain English. No dropdowns, no forms. Just:

"My cat has been hiding under the bed for 3 days, not eating, and hissing when touched"

from transformers import pipeline

classifier = pipeline(
    "zero-shot-classification",
    model="facebook/bart-large-mnli"
)

labels = [
    "pain or injury",
    "anxiety or stress",
    "illness",
    "territorial behavior",
    "attention-seeking",
    "environmental change"
]

result = classifier(user_input, candidate_labels=labels)
print(result['labels'][0])  # Top hypothesis

Zero-shot classification gives us a fast first-pass triage without needing labeled training data for every edge case.

2. RAG Layer — Veterinary knowledge retrieval

Once we have candidate hypotheses, we pull the 5 most relevant chunks from our vector store (built on FAISS + OpenAI embeddings), sourced from:

AVMA clinical guidelines
Open-access veterinary journals
Behavioral ethology research

import faiss
import numpy as np
from openai import OpenAI

client = OpenAI()

def get_embedding(text: str) -> list[float]:
    response = client.embeddings.create(
        input=text,
        model="text-embedding-3-small"
    )
    return response.data[0].embedding

def retrieve_context(query: str, index, chunks: list, k=5):
    query_vec = np.array([get_embedding(query)], dtype='float32')
    _, indices = index.search(query_vec, k)
    return [chunks[i] for i in indices[0]]

We embed the user's description + top hypothesis together as the retrieval query. This dramatically improves chunk relevance vs. embedding the raw question alone.

3. Generation Layer — Contextual response

The retrieved vet literature + the behavioral classification feed into GPT-4o-mini as context. The prompt is engineered to:

Always give 3 ranked possible causes
Flag anything that needs urgent vet attention (weight: 1st position)
Never diagnose — only interpret and inform
Keep responses under 280 words

system_prompt = """
You are a veterinary behavioral assistant. Given a pet owner's description 
and retrieved clinical context, provide:
1. Top 3 possible behavioral causes (ranked by likelihood)
2. One urgent flag if any symptom pattern suggests immediate vet visit
3. One actionable home observation tip

Never diagnose. Always recommend professional consultation for medical concerns.
Be warm, clear, and concise.
"""

4. Safety Layer — The urgent flag classifier

Before anything reaches the user, a lightweight binary classifier checks for 47 "red flag" patterns (vomiting + lethargy, seizure-like descriptions, breathing changes). If triggered, the response leads with a clear vet-now callout.

RED_FLAGS = [
    r"can't breathe",
    r"not moving",
    r"blood in (stool|urine|vomit)",
    r"seizure",
    r"collapse",
    # ... 42 more
]

import re

def check_urgent(text: str) -> bool:
    text_lower = text.lower()
    return any(re.search(pattern, text_lower) for pattern in RED_FLAGS)

Regex first, model second. Fast and cheap for the most critical path.

Performance

Metric	Value
Avg response time	1.8s
Classification accuracy (test set)	83%
User satisfaction (post-session survey)	4.6 / 5
Urgent flag precision	94%

The 17% misclassification rate is mostly in ambiguous "normal vs. stress" edge cases — something we're actively improving with a fine-tuned model on labeled behavioral data.

What I Learned

1. Zero-shot beats fine-tuning for cold start. Getting to a working MVP in 2 weeks was only possible because we didn't need labeled training data.

2. Retrieval quality > generation quality. The single biggest improvement came from better chunking (512 tokens with 64-token overlap) and dual-query embeddings, not from swapping models.

3. Safety engineering is non-negotiable. The red flag layer took one afternoon to build. It's arguably the most important part of the entire system.

4. Pet owners are incredibly descriptive. The free-text input assumption held. People give rich, detailed descriptions when they're worried about their animals. That richness is what makes the RAG retrieval sing.

Try It

If you have a pet acting strange and want to decode it, the tool is live at MyPetTherapist.com. Free to use, no account required.

Drop questions in the comments — happy to go deeper on the RAG pipeline, the FAISS setup, or the safety classifier logic. 🐾

DEV Community