My dog Biscuit has three moods: hungry, suspicious, and vibing. At least, that's what I thought until I started logging his behavior and running it through a classifier. What I found was... humbling.
This is the story of how I built a lightweight pet emotion detection pipeline over a weekend, what worked, what didn't, and how it accidentally turned into a real product idea.
The Problem (Yes, There Is One)
Pet owners worry. A lot. According to the American Pet Products Association, 67% of U.S. households own a pet — and a huge chunk of them regularly Google things like "why is my cat staring at the wall" or "dog suddenly scared of nothing".
The real problem isn't weird behavior. It's anxiety without context. Owners see a symptom, spiral, and either over-medicate or under-react.
What if a simple classifier could give them a starting point?
The Stack
- Python 3.11
- FastAPI for the inference endpoint
- TensorFlow Lite (MobileNetV2 fine-tuned on pet behavior frames)
- OpenCV for frame extraction
- Free-tier Railway deploy
Total cost: ~$0/month at low traffic.
Step 1: Dataset Assembly
I used:
- YouTube clips — "dog anxiety signs", "cat zoomies explained", "rabbit binky"
- r/aww and r/dogs — scraped public posts with labels in titles
- Manual labeling — 400 short clips. Send help.
Label schema:
LABELS = {
0: "calm",
1: "anxious",
2: "playful",
3: "aggressive",
4: "uncertain"
}
The uncertain class saved me a lot of false confidence.
Step 2: Fine-Tuning MobileNetV2
MobileNetV2 is 14MB, fast, and perfect for 224x224 frame classification.
import tensorflow as tf
from tensorflow.keras.applications import MobileNetV2
from tensorflow.keras import layers, Model
base_model = MobileNetV2(
input_shape=(224, 224, 3),
include_top=False,
weights="imagenet"
)
base_model.trainable = False
x = base_model.output
x = layers.GlobalAveragePooling2D()(x)
x = layers.Dense(128, activation="relu")(x)
x = layers.Dropout(0.3)(x)
output = layers.Dense(5, activation="softmax")(x)
model = Model(inputs=base_model.input, outputs=output)
model.compile(
optimizer=tf.keras.optimizers.Adam(1e-4),
loss="sparse_categorical_crossentropy",
metrics=["accuracy"]
)
80/20 train/val split on ~3,200 labeled frames.
Results after 20 epochs: Train 84%, Val 71%.
Step 3: FastAPI Inference Endpoint
from fastapi import FastAPI, UploadFile, File
import numpy as np
import cv2
import tensorflow as tf
app = FastAPI()
model = tf.lite.Interpreter(model_path="pet_emotion.tflite")
model.allocate_tensors()
LABELS = ["calm", "anxious", "playful", "aggressive", "uncertain"]
def preprocess_frame(frame_bytes: bytes) -> np.ndarray:
arr = np.frombuffer(frame_bytes, np.uint8)
img = cv2.imdecode(arr, cv2.IMREAD_COLOR)
img = cv2.resize(img, (224, 224))
img = img.astype(np.float32) / 255.0
return np.expand_dims(img, axis=0)
@app.post("/predict")
async def predict_emotion(file: UploadFile = File(...)):
contents = await file.read()
tensor = preprocess_frame(contents)
input_details = model.get_input_details()
output_details = model.get_output_details()
model.set_tensor(input_details[0]["index"], tensor)
model.invoke()
probs = model.get_tensor(output_details[0]["index"])[0]
top_idx = int(np.argmax(probs))
return {
"emotion": LABELS[top_idx],
"confidence": round(float(probs[top_idx]), 3),
"all_probs": {LABELS[i]: round(float(p), 3) for i, p in enumerate(probs)}
}
Sample response:
{
"emotion": "anxious",
"confidence": 0.76,
"all_probs": {
"calm": 0.09,
"anxious": 0.76,
"playful": 0.08,
"aggressive": 0.02,
"uncertain": 0.05
}
}
What Surprised Me
1. The "uncertain" class is your best friend
Without it, tired dogs got classified as anxious 40% of the time. The uncertainty bucket dropped false positives dramatically.
2. Breed matters more than expected
A resting Husky and an alarmed Chihuahua activate very similar patterns. I added breed as a metadata feature:
def adjust_for_breed(emotion: str, confidence: float, breed_group: str) -> dict:
if breed_group == "toy" and emotion == "anxious" and confidence < 0.65:
return {"emotion": "calm", "adjusted": True}
return {"emotion": emotion, "adjusted": False}
3. Video > single frame
Single-frame accuracy: 71%. Sliding window majority vote over 5 frames: 81%.
from collections import Counter
def majority_vote(predictions: list[str]) -> str:
return Counter(predictions).most_common(1)[0][0]
The Product Angle
After demoing this to pet-owner friends, the consistent feedback was: "I just want to know if my pet is okay."
That's a UX problem, not a model problem. The classifier is a means to an end — reducing owner anxiety is the actual product.
If that framing resonates, check out MyPetTherapist — they're building in this space with a much more holistic approach than a single-frame classifier.
TL;DR
- MobileNetV2 fine-tuned on 3,200 frames → 71% single-frame, 81% with voting
- FastAPI + TFLite = production-ready weekend stack
-
uncertainclass > accuracy tuning for user trust - Real challenge: UX framing, not the model
Drop a ❤️ if this was useful — and comment if you've tackled audio classification for animal sounds.
Top comments (0)