Your dog pins their ears back. Your cat flicks their tail. Most pet owners miss 80% of what their animals are telling them — not because they don't care, but because we were never taught the language.
I built MyPetTherapist to fix that. Here's the full technical breakdown of how we use AI to interpret pet body language from a single photo.
The Problem Worth Solving
Vets see your pet for maybe 20 minutes a year. In that window, anxiety, pain signals, and behavioral red flags often stay hidden — pets mask stress in unfamiliar environments. The real behavior happens at home, and it's invisible to professionals.
The question I kept asking: what if a phone camera could become a 24/7 behavioral observer?
Architecture Overview
User uploads photo
↓
FastAPI backend
↓
Vision model (GPT-4o) — keypoint extraction + behavioral inference
↓
Structured JSON output
↓
Report generator (species-specific templates)
↓
PDF/HTML delivery to user
Simple on paper. The complexity lives in the prompt engineering and the output validation layer.
Step 1: Vision Analysis via GPT-4o
The core call looks like this:
import base64
import httpx
from openai import OpenAI
client = OpenAI()
def encode_image(image_path: str) -> str:
with open(image_path, "rb") as f:
return base64.b64encode(f.read()).decode("utf-8")
def analyze_pet_body_language(image_path: str, species: str) -> dict:
base64_image = encode_image(image_path)
system_prompt = f"""
You are a certified animal behaviorist specializing in {species} body language.
Analyze the image and return a structured JSON with:
- posture: overall body posture assessment
- ears: ear position and what it signals
- tail: tail position/movement and interpretation
- eyes: eye shape, dilation, gaze direction
- muscle_tension: visible tension indicators
- stress_level: 1-10 scale with reasoning
- emotional_state: primary emotional label
- confidence: your confidence score 0.0-1.0
- recommendations: list of actionable advice
"""
response = client.chat.completions.create(
model="gpt-4o",
messages=[
{"role": "system", "content": system_prompt},
{
"role": "user",
"content": [
{
"type": "image_url",
"image_url": {
"url": f"data:image/jpeg;base64,{base64_image}",
"detail": "high"
}
},
{
"type": "text",
"text": f"Analyze the {species} body language in this image. Return valid JSON only."
}
]
}
],
response_format={"type": "json_object"},
max_tokens=1000
)
return json.loads(response.choices[0].message.content)
The response_format: json_object parameter is doing a lot of heavy lifting here. Without it, GPT-4o will occasionally wrap the JSON in markdown fences or add a preamble sentence — and your parser silently fails.
Step 2: Output Validation with Pydantic
Raw LLM output needs a schema guard. We use Pydantic v2 for this:
from pydantic import BaseModel, Field, field_validator
from typing import Literal
class BodyLanguageReport(BaseModel):
posture: str
ears: str
tail: str | None = None # snakes don't have tails (well, they do, but)
eyes: str
muscle_tension: str
stress_level: int = Field(ge=1, le=10)
emotional_state: Literal[
"relaxed", "anxious", "fearful", "playful",
"aggressive", "neutral", "excited", "in_pain"
]
confidence: float = Field(ge=0.0, le=1.0)
recommendations: list[str]
@field_validator("recommendations")
@classmethod
def limit_recommendations(cls, v):
# Keep it actionable, not overwhelming
return v[:5]
def parse_analysis(raw: dict) -> BodyLanguageReport:
try:
return BodyLanguageReport(**raw)
except Exception as e:
# Log and return a graceful degradation
logger.warning(f"Validation failed: {e}. Raw: {raw}")
raise ValueError("Analysis parsing failed — retrying with fallback prompt")
The emotional_state enum is controversial internally — we debated whether to allow free-form strings. We landed on a fixed vocabulary for two reasons:
- Consistency across reports (users compare week-over-week)
- Defensibility — we're not making up states, we're classifying into known behavioral categories
Step 3: Species-Specific Prompt Templates
This is where most pet AI projects get lazy — they treat all animals the same. Dog body language and cat body language are not the same problem.
A wagging tail in a dog = excitement. In a cat = irritation. Same signal, opposite meaning.
We maintain a species_config.yaml:
dog:
tail_baseline: "neutral horizontal"
stress_indicators:
- "whale eye (visible sclera)"
- "lip licking without food present"
- "yawning in non-tired context"
- "paw raise"
- "tucked tail below hock"
play_indicators:
- "play bow (front down, rear up)"
- "loose body wiggle"
- "open relaxed mouth"
cat:
tail_baseline: "upright with slight curve = confident/friendly"
stress_indicators:
- "dilated pupils in normal light"
- "flattened ears (airplane ears)"
- "tail lashing side to side"
- "whiskers pulled back flat"
- "crouched low body posture"
play_indicators:
- "tail up, slow blink"
- "chirping at window"
- "relaxed exposed belly (invitation, NOT submission)"
The prompt dynamically injects the relevant species block:
import yaml
with open("species_config.yaml") as f:
SPECIES_CONFIG = yaml.safe_load(f)
def build_system_prompt(species: str) -> str:
config = SPECIES_CONFIG.get(species, {})
stress = "\n".join(f"- {s}" for s in config.get("stress_indicators", []))
play = "\n".join(f"- {p}" for p in config.get("play_indicators", []))
return f"""You are a certified animal behaviorist specializing in {species} behavior.
Known {species} stress signals to watch for:
{stress}
Known {species} positive/play signals:
{play}
Analyze the uploaded image with this species-specific context in mind.
Return structured JSON only."""
Step 4: The Retry + Confidence Threshold Layer
Low-confidence results (< 0.65) trigger a second-pass with a simplified prompt:
async def analyze_with_fallback(
image_path: str,
species: str,
max_retries: int = 2
) -> BodyLanguageReport:
for attempt in range(max_retries):
result = await analyze_pet_body_language(image_path, species)
report = parse_analysis(result)
if report.confidence >= 0.65:
return report
if attempt == 0:
# Retry with explicit "be conservative" instruction
logger.info(f"Low confidence ({report.confidence}), retrying with conservative prompt")
# Modify prompt to emphasize uncertainty is OK
# Return best result even if below threshold, flagged for review
report.flagged_for_review = True
return report
We flag low-confidence analyses in our DB and use them as training signal — they're often edge cases (unusual breeds, bad lighting, multiple animals in frame).
What We Learned After 10,000+ Analyses
1. Image quality kills accuracy more than model choice. A blurry iPhone photo in bad lighting will defeat GPT-4o. We added a pre-flight blur detection step using OpenCV before even calling the API.
2. Context matters massively. A dog mid-yawn looks identical to a dog stress-panting. Adding contextual fields ("is this a vet visit photo?", "was the dog just playing?") in the analysis request improved accuracy by ~18%.
3. Users over-interpret stress scores. A stress_level of 6/10 sent some users into panic mode. We added a baseline comparison ("your dog's average is 4.2 — today is 6.0, slightly elevated but within normal range") which reduced support tickets by 40%.
4. Multi-frame analysis > single photo. We're now rolling out video clip analysis (5-second clips) using frame sampling. Early results show significantly better accuracy for dynamic signals like panting rate and ear micromovement.
What's Next
The v2 pipeline adds:
- Longitudinal tracking — weekly stress trend graphs per pet
- Multimodal audio — analyzing vocalizations alongside body language
- Vet integration API — so pet owners can share reports directly with their vet
If you're working on anything similar in the pet-tech or animal behavior space, I'd love to compare notes in the comments.
And if you want to see the final product in action — try it free at mypettherapist.com. Upload a photo of your pet and get a full behavioral report in under 60 seconds.
Built with Python, FastAPI, GPT-4o, Pydantic v2, and a lot of research into animal behavior literature. The species config YAML currently covers dogs, cats, rabbits, and birds — horses coming soon.
Top comments (0)