Esther Studer

Posted on Mar 28

I Built a Pet Emotion Classifier With Python in a Weekend (Here's What I Learned)

#ai #webdev #python #showdev

My dog Biscuit has three moods: hungry, suspicious, and vibing. At least, that's what I thought until I started logging his behavior and running it through a classifier. What I found was... humbling.

This is the story of how I built a lightweight pet emotion detection pipeline over a weekend, what worked, what didn't, and how it accidentally turned into a real product idea.

The Problem (Yes, There Is One)

Pet owners worry. A lot. According to the American Pet Products Association, 67% of U.S. households own a pet — and a huge chunk of them regularly Google things like "why is my cat staring at the wall" or "dog suddenly scared of nothing".

The real problem isn't weird behavior. It's anxiety without context. Owners see a symptom, spiral, and either over-medicate or under-react.

What if a simple classifier could give them a starting point?

The Stack

Python 3.11
FastAPI for the inference endpoint
TensorFlow Lite (MobileNetV2 fine-tuned on pet behavior frames)
OpenCV for frame extraction
Free-tier Railway deploy

Total cost: ~$0/month at low traffic.

Step 1: Dataset Assembly

I used:

YouTube clips — "dog anxiety signs", "cat zoomies explained", "rabbit binky"
r/aww and r/dogs — scraped public posts with labels in titles
Manual labeling — 400 short clips. Send help.

Label schema:

LABELS = {
    0: "calm",
    1: "anxious",
    2: "playful",
    3: "aggressive",
    4: "uncertain"
}

The uncertain class saved me a lot of false confidence.

Step 2: Fine-Tuning MobileNetV2

MobileNetV2 is 14MB, fast, and perfect for 224x224 frame classification.

import tensorflow as tf
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import layers, Model

base_model = MobileNetV2(
    input_shape=(224, 224, 3),
    include_top=False,
    weights="imagenet"
)
base_model.trainable = False

x = base_model.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation="relu")(x)
x = layers.Dropout(0.3)(x)
output = layers.Dense(5, activation="softmax")(x)

model = Model(inputs=base_model.input, outputs=output)
model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-4),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)

80/20 train/val split on ~3,200 labeled frames.

Results after 20 epochs: Train 84%, Val 71%.

Step 3: FastAPI Inference Endpoint

from fastapi import FastAPI, UploadFile, File
import numpy as np
import cv2
import tensorflow as tf

app = FastAPI()
model = tf.lite.Interpreter(model_path="pet_emotion.tflite")
model.allocate_tensors()

LABELS = ["calm", "anxious", "playful", "aggressive", "uncertain"]

def preprocess_frame(frame_bytes: bytes) -> np.ndarray:
    arr = np.frombuffer(frame_bytes, np.uint8)
    img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
    img = cv2.resize(img, (224, 224))
    img = img.astype(np.float32) / 255.0
    return np.expand_dims(img, axis=0)

@app.post("/predict")
async def predict_emotion(file: UploadFile = File(...)):
    contents = await file.read()
    tensor = preprocess_frame(contents)
    input_details = model.get_input_details()
    output_details = model.get_output_details()
    model.set_tensor(input_details[0]["index"], tensor)
    model.invoke()
    probs = model.get_tensor(output_details[0]["index"])[0]
    top_idx = int(np.argmax(probs))
    return {
        "emotion": LABELS[top_idx],
        "confidence": round(float(probs[top_idx]), 3),
        "all_probs": {LABELS[i]: round(float(p), 3) for i, p in enumerate(probs)}
    }

Sample response:

{
  "emotion": "anxious",
  "confidence": 0.76,
  "all_probs": {
    "calm": 0.09,
    "anxious": 0.76,
    "playful": 0.08,
    "aggressive": 0.02,
    "uncertain": 0.05
  }
}

What Surprised Me

1. The "uncertain" class is your best friend

Without it, tired dogs got classified as anxious 40% of the time. The uncertainty bucket dropped false positives dramatically.

2. Breed matters more than expected

A resting Husky and an alarmed Chihuahua activate very similar patterns. I added breed as a metadata feature:

def adjust_for_breed(emotion: str, confidence: float, breed_group: str) -> dict:
    if breed_group == "toy" and emotion == "anxious" and confidence < 0.65:
        return {"emotion": "calm", "adjusted": True}
    return {"emotion": emotion, "adjusted": False}

3. Video > single frame

Single-frame accuracy: 71%. Sliding window majority vote over 5 frames: 81%.

from collections import Counter

def majority_vote(predictions: list[str]) -> str:
    return Counter(predictions).most_common(1)[0][0]

The Product Angle

After demoing this to pet-owner friends, the consistent feedback was: "I just want to know if my pet is okay."

That's a UX problem, not a model problem. The classifier is a means to an end — reducing owner anxiety is the actual product.

If that framing resonates, check out MyPetTherapist — they're building in this space with a much more holistic approach than a single-frame classifier.

TL;DR

MobileNetV2 fine-tuned on 3,200 frames → 71% single-frame, 81% with voting
FastAPI + TFLite = production-ready weekend stack
uncertain class > accuracy tuning for user trust
Real challenge: UX framing, not the model

Drop a ❤️ if this was useful — and comment if you've tackled audio classification for animal sounds.

DEV Community