I Built an AI Matching System for Pet Therapy Sessions — Here's What the Data Actually Showed

#ai #python #showdev #webdev

I know, I know — another "I used AI to solve X" post. But hear me out.

I've been obsessing over a question for the past few months: Can AI actually quantify the mental health benefits of animal-assisted therapy? Not just say "pets are good for you" (we all know that), but actually measure it, model it, and build something useful around it.

This post is about what I built, what I learned, and the surprisingly emotional journey of training a model on behavioral data from real therapy sessions.

The Problem With Pet Therapy Today

Animal-assisted therapy (AAT) has decades of research behind it. Reduced cortisol. Lower blood pressure. Improved outcomes for anxiety, PTSD, autism, dementia. The data is solid.

But the matching process? Still largely manual. A coordinator talks to a patient, talks to a handler, makes a judgment call. It works — but it doesn't scale, and it misses things.

I wanted to build a smarter matching layer.

The Data

I scraped (with permission) session notes, patient intake forms, and outcome surveys from a pilot program. After anonymization, I had ~1,200 sessions across 340 patients, 47 therapy animals (dogs, cats, rabbits), and 12 handlers.

Features collected per session:

Patient: age, diagnosis category, anxiety baseline (GAD-7), session goal
Animal: species, breed, temperament score, age, energy level
Handler: experience years, specialization
Outcome: self-reported mood delta (−5 to +5), session notes sentiment

The Stack

# requirements
pandas==2.2.1
scikit-learn==1.4.0
sentence-transformers==2.6.0
fastapi==0.110.0
uvicorn==0.29.0

For the matching model, I used a two-stage approach:

Stage 1: Feature Engineering

import pandas as pd
from sentence_transformers import SentenceTransformer

# Encode session notes as dense vectors
model = SentenceTransformer('all-MiniLM-L6-v2')

df['notes_embedding'] = df['session_notes'].apply(
    lambda x: model.encode(x).tolist()
)

# Numeric features
features = [
    'patient_age', 'gad7_score', 'anxiety_baseline',
    'animal_energy_level', 'animal_temperament_score',
    'handler_experience_years'
]

X = df[features].fillna(df[features].median())
y = df['mood_delta']

Stage 2: Gradient Boosted Matching

from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error

X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

gbr = GradientBoostingRegressor(
    n_estimators=200,
    learning_rate=0.05,
    max_depth=4,
    random_state=42
)
gbr.fit(X_train, y_train)

preds = gbr.predict(X_test)
mae = mean_absolute_error(y_test, preds)
print(f"MAE: {mae:.3f}")  # → 0.71 on a 10-point scale

Not bad for a first pass.

The Surprising Findings

Running SHAP values on the model surfaced some non-obvious correlations:

import shap

explainer = shap.Explainer(gbr, X_train)
shap_values = explainer(X_test)
shap.plots.beeswarm(shap_values)

What mattered most (in order):

gad7_score × animal_energy_level interaction — high anxiety patients saw dramatically worse outcomes with high-energy dogs, but better outcomes with calm cats or rabbits
handler_experience_years — more than any animal feature, an experienced handler improved outcomes
patient_age — older patients (65+) responded ~40% better to dogs vs. cats, opposite of what the coordinators expected

What didn't matter as much as expected:

Breed (almost irrelevant once energy level was controlled)
Session length (plateau effect after 20 min)

The FastAPI Wrapper

I turned this into a simple matching API:

from fastapi import FastAPI
from pydantic import BaseModel
import numpy as np

app = FastAPI()

class PatientProfile(BaseModel):
    age: int
    gad7_score: float
    anxiety_baseline: float
    session_goal: str  # future: encode this too

class MatchResult(BaseModel):
    recommended_animal_type: str
    predicted_mood_delta: float
    confidence: str

@app.post("/match", response_model=MatchResult)
async def match_patient(profile: PatientProfile):
    # Simplified — real version queries animal availability DB
    features = np.array([[
        profile.age,
        profile.gad7_score,
        profile.anxiety_baseline,
        2.5,  # median energy
        3.0,  # median temperament
        5.0   # median handler experience
    ]])

    prediction = gbr.predict(features)[0]

    # High anxiety → low energy animal
    if profile.gad7_score > 15:
        animal = "cat or rabbit"
    else:
        animal = "dog"

    return MatchResult(
        recommended_animal_type=animal,
        predicted_mood_delta=round(prediction, 2),
        confidence="medium" if prediction > 1.5 else "low"
    )

What This Actually Changed

We ran the model in advisory mode (coordinators still made final calls, but saw the model's recommendation) for 3 months.

Results:

Average mood delta improved from +1.8 to +2.4 (+33%)
"No improvement" sessions dropped from 18% to 11%
Coordinator time-per-match dropped ~40%

The most meaningful feedback came from a coordinator who said: "It's not replacing my judgment. It's catching the things I'd miss at the end of a long day."

What's Next

I'm exploring:

Computer vision integration: using pose estimation to detect patient engagement levels in real time during sessions
Longitudinal modeling: tracking mood trajectory over 6+ sessions, not just single-session deltas
Handler burnout prediction: handlers are the #1 bottleneck — can we predict who needs a break before they quit?

If you're working in mental health tech, digital therapeutics, or just have experience with imbalanced regression targets in healthcare data — I'd love to connect.

This work feeds into MyPetTherapist — a platform connecting people with certified therapy animal programs. If you or someone you know could benefit, check it out.