Your dog stares at the wall at 3 AM. Your cat knocks everything off the table — again. Your rabbit is thumping for the fifth time today. You know something is off, but you can't decode it.
I got tired of Googling "why does my dog spin in circles before lying down" and built something better: an AI that interprets pet behavior using a combination of NLP, a fine-tuned symptom classifier, and a lightweight RAG pipeline backed by veterinary literature.
Here's exactly how it works under the hood.
The Problem With Pet Behavior Data
Most pet advice online is:
- Generic ("consult a vet" — thanks, Captain Obvious)
- Anecdotal (Reddit posts from 2011)
- Buried in 3,000-word listicles
What pet owners actually need is fast, contextual interpretation — "My 4-year-old Labrador is licking his paws obsessively after walks" should yield a precise, ranked list of possible causes, not a Wikipedia entry on dogs.
The Architecture
1. Input Layer — Free-text behavior description
Users describe their pet's behavior in plain English. No dropdowns, no forms. Just:
"My cat has been hiding under the bed for 3 days, not eating, and hissing when touched"
from transformers import pipeline
classifier = pipeline(
"zero-shot-classification",
model="facebook/bart-large-mnli"
)
labels = [
"pain or injury",
"anxiety or stress",
"illness",
"territorial behavior",
"attention-seeking",
"environmental change"
]
result = classifier(user_input, candidate_labels=labels)
print(result['labels'][0]) # Top hypothesis
Zero-shot classification gives us a fast first-pass triage without needing labeled training data for every edge case.
2. RAG Layer — Veterinary knowledge retrieval
Once we have candidate hypotheses, we pull the 5 most relevant chunks from our vector store (built on FAISS + OpenAI embeddings), sourced from:
- AVMA clinical guidelines
- Open-access veterinary journals
- Behavioral ethology research
import faiss
import numpy as np
from openai import OpenAI
client = OpenAI()
def get_embedding(text: str) -> list[float]:
response = client.embeddings.create(
input=text,
model="text-embedding-3-small"
)
return response.data[0].embedding
def retrieve_context(query: str, index, chunks: list, k=5):
query_vec = np.array([get_embedding(query)], dtype='float32')
_, indices = index.search(query_vec, k)
return [chunks[i] for i in indices[0]]
We embed the user's description + top hypothesis together as the retrieval query. This dramatically improves chunk relevance vs. embedding the raw question alone.
3. Generation Layer — Contextual response
The retrieved vet literature + the behavioral classification feed into GPT-4o-mini as context. The prompt is engineered to:
- Always give 3 ranked possible causes
- Flag anything that needs urgent vet attention (weight: 1st position)
- Never diagnose — only interpret and inform
- Keep responses under 280 words
system_prompt = """
You are a veterinary behavioral assistant. Given a pet owner's description
and retrieved clinical context, provide:
1. Top 3 possible behavioral causes (ranked by likelihood)
2. One urgent flag if any symptom pattern suggests immediate vet visit
3. One actionable home observation tip
Never diagnose. Always recommend professional consultation for medical concerns.
Be warm, clear, and concise.
"""
4. Safety Layer — The urgent flag classifier
Before anything reaches the user, a lightweight binary classifier checks for 47 "red flag" patterns (vomiting + lethargy, seizure-like descriptions, breathing changes). If triggered, the response leads with a clear vet-now callout.
RED_FLAGS = [
r"can't breathe",
r"not moving",
r"blood in (stool|urine|vomit)",
r"seizure",
r"collapse",
# ... 42 more
]
import re
def check_urgent(text: str) -> bool:
text_lower = text.lower()
return any(re.search(pattern, text_lower) for pattern in RED_FLAGS)
Regex first, model second. Fast and cheap for the most critical path.
Performance
| Metric | Value |
|---|---|
| Avg response time | 1.8s |
| Classification accuracy (test set) | 83% |
| User satisfaction (post-session survey) | 4.6 / 5 |
| Urgent flag precision | 94% |
The 17% misclassification rate is mostly in ambiguous "normal vs. stress" edge cases — something we're actively improving with a fine-tuned model on labeled behavioral data.
What I Learned
1. Zero-shot beats fine-tuning for cold start. Getting to a working MVP in 2 weeks was only possible because we didn't need labeled training data.
2. Retrieval quality > generation quality. The single biggest improvement came from better chunking (512 tokens with 64-token overlap) and dual-query embeddings, not from swapping models.
3. Safety engineering is non-negotiable. The red flag layer took one afternoon to build. It's arguably the most important part of the entire system.
4. Pet owners are incredibly descriptive. The free-text input assumption held. People give rich, detailed descriptions when they're worried about their animals. That richness is what makes the RAG retrieval sing.
Try It
If you have a pet acting strange and want to decode it, the tool is live at MyPetTherapist.com. Free to use, no account required.
Drop questions in the comments — happy to go deeper on the RAG pipeline, the FAISS setup, or the safety classifier logic. 🐾
Top comments (0)