DEV Community

Esther Studer
Esther Studer

Posted on

I Built a Pet Emotion Classifier With Python in a Weekend (Here's What I Learned)

My dog Biscuit has three moods: hungry, suspicious, and vibing. At least, that's what I thought until I started logging his behavior and running it through a classifier. What I found was... humbling.

This is the story of how I built a lightweight pet emotion detection pipeline over a weekend, what worked, what didn't, and how it accidentally turned into a real product idea.


The Problem (Yes, There Is One)

Pet owners worry. A lot. According to the American Pet Products Association, 67% of U.S. households own a pet — and a huge chunk of them regularly Google things like "why is my cat staring at the wall" or "dog suddenly scared of nothing".

The real problem isn't weird behavior. It's anxiety without context. Owners see a symptom, spiral, and either over-medicate or under-react.

What if a simple classifier could give them a starting point?


The Stack

  • Python 3.11
  • FastAPI for the inference endpoint
  • TensorFlow Lite (MobileNetV2 fine-tuned on pet behavior frames)
  • OpenCV for frame extraction
  • Free-tier Railway deploy

Total cost: ~$0/month at low traffic.


Step 1: Dataset Assembly

I used:

  1. YouTube clips — "dog anxiety signs", "cat zoomies explained", "rabbit binky"
  2. r/aww and r/dogs — scraped public posts with labels in titles
  3. Manual labeling — 400 short clips. Send help.

Label schema:

LABELS = {
    0: "calm",
    1: "anxious",
    2: "playful",
    3: "aggressive",
    4: "uncertain"
}
Enter fullscreen mode Exit fullscreen mode

The uncertain class saved me a lot of false confidence.


Step 2: Fine-Tuning MobileNetV2

MobileNetV2 is 14MB, fast, and perfect for 224x224 frame classification.

import tensorflow as tf
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import layers, Model

base_model = MobileNetV2(
    input_shape=(224, 224, 3),
    include_top=False,
    weights="imagenet"
)
base_model.trainable = False

x = base_model.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation="relu")(x)
x = layers.Dropout(0.3)(x)
output = layers.Dense(5, activation="softmax")(x)

model = Model(inputs=base_model.input, outputs=output)
model.compile(
    optimizer=tf.keras.optimizers.Adam(1e-4),
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"]
)
Enter fullscreen mode Exit fullscreen mode

80/20 train/val split on ~3,200 labeled frames.

Results after 20 epochs: Train 84%, Val 71%.


Step 3: FastAPI Inference Endpoint

from fastapi import FastAPI, UploadFile, File
import numpy as np
import cv2
import tensorflow as tf

app = FastAPI()
model = tf.lite.Interpreter(model_path="pet_emotion.tflite")
model.allocate_tensors()

LABELS = ["calm", "anxious", "playful", "aggressive", "uncertain"]

def preprocess_frame(frame_bytes: bytes) -> np.ndarray:
    arr = np.frombuffer(frame_bytes, np.uint8)
    img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
    img = cv2.resize(img, (224, 224))
    img = img.astype(np.float32) / 255.0
    return np.expand_dims(img, axis=0)

@app.post("/predict")
async def predict_emotion(file: UploadFile = File(...)):
    contents = await file.read()
    tensor = preprocess_frame(contents)
    input_details = model.get_input_details()
    output_details = model.get_output_details()
    model.set_tensor(input_details[0]["index"], tensor)
    model.invoke()
    probs = model.get_tensor(output_details[0]["index"])[0]
    top_idx = int(np.argmax(probs))
    return {
        "emotion": LABELS[top_idx],
        "confidence": round(float(probs[top_idx]), 3),
        "all_probs": {LABELS[i]: round(float(p), 3) for i, p in enumerate(probs)}
    }
Enter fullscreen mode Exit fullscreen mode

Sample response:

{
  "emotion": "anxious",
  "confidence": 0.76,
  "all_probs": {
    "calm": 0.09,
    "anxious": 0.76,
    "playful": 0.08,
    "aggressive": 0.02,
    "uncertain": 0.05
  }
}
Enter fullscreen mode Exit fullscreen mode

What Surprised Me

1. The "uncertain" class is your best friend

Without it, tired dogs got classified as anxious 40% of the time. The uncertainty bucket dropped false positives dramatically.

2. Breed matters more than expected

A resting Husky and an alarmed Chihuahua activate very similar patterns. I added breed as a metadata feature:

def adjust_for_breed(emotion: str, confidence: float, breed_group: str) -> dict:
    if breed_group == "toy" and emotion == "anxious" and confidence < 0.65:
        return {"emotion": "calm", "adjusted": True}
    return {"emotion": emotion, "adjusted": False}
Enter fullscreen mode Exit fullscreen mode

3. Video > single frame

Single-frame accuracy: 71%. Sliding window majority vote over 5 frames: 81%.

from collections import Counter

def majority_vote(predictions: list[str]) -> str:
    return Counter(predictions).most_common(1)[0][0]
Enter fullscreen mode Exit fullscreen mode

The Product Angle

After demoing this to pet-owner friends, the consistent feedback was: "I just want to know if my pet is okay."

That's a UX problem, not a model problem. The classifier is a means to an end — reducing owner anxiety is the actual product.

If that framing resonates, check out MyPetTherapist — they're building in this space with a much more holistic approach than a single-frame classifier.


TL;DR

  • MobileNetV2 fine-tuned on 3,200 frames → 71% single-frame, 81% with voting
  • FastAPI + TFLite = production-ready weekend stack
  • uncertain class > accuracy tuning for user trust
  • Real challenge: UX framing, not the model

Drop a ❤️ if this was useful — and comment if you've tackled audio classification for animal sounds.

Top comments (0)