I know, I know — another "I used AI to solve X" post. But hear me out.
I've been obsessing over a question for the past few months: Can AI actually quantify the mental health benefits of animal-assisted therapy? Not just say "pets are good for you" (we all know that), but actually measure it, model it, and build something useful around it.
This post is about what I built, what I learned, and the surprisingly emotional journey of training a model on behavioral data from real therapy sessions.
The Problem With Pet Therapy Today
Animal-assisted therapy (AAT) has decades of research behind it. Reduced cortisol. Lower blood pressure. Improved outcomes for anxiety, PTSD, autism, dementia. The data is solid.
But the matching process? Still largely manual. A coordinator talks to a patient, talks to a handler, makes a judgment call. It works — but it doesn't scale, and it misses things.
I wanted to build a smarter matching layer.
The Data
I scraped (with permission) session notes, patient intake forms, and outcome surveys from a pilot program. After anonymization, I had ~1,200 sessions across 340 patients, 47 therapy animals (dogs, cats, rabbits), and 12 handlers.
Features collected per session:
- Patient: age, diagnosis category, anxiety baseline (GAD-7), session goal
- Animal: species, breed, temperament score, age, energy level
- Handler: experience years, specialization
- Outcome: self-reported mood delta (−5 to +5), session notes sentiment
The Stack
# requirements
pandas==2.2.1
scikit-learn==1.4.0
sentence-transformers==2.6.0
fastapi==0.110.0
uvicorn==0.29.0
For the matching model, I used a two-stage approach:
Stage 1: Feature Engineering
import pandas as pd
from sentence_transformers import SentenceTransformer
# Encode session notes as dense vectors
model = SentenceTransformer('all-MiniLM-L6-v2')
df['notes_embedding'] = df['session_notes'].apply(
lambda x: model.encode(x).tolist()
)
# Numeric features
features = [
'patient_age', 'gad7_score', 'anxiety_baseline',
'animal_energy_level', 'animal_temperament_score',
'handler_experience_years'
]
X = df[features].fillna(df[features].median())
y = df['mood_delta']
Stage 2: Gradient Boosted Matching
from sklearn.ensemble import GradientBoostingRegressor
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_absolute_error
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=42
)
gbr = GradientBoostingRegressor(
n_estimators=200,
learning_rate=0.05,
max_depth=4,
random_state=42
)
gbr.fit(X_train, y_train)
preds = gbr.predict(X_test)
mae = mean_absolute_error(y_test, preds)
print(f"MAE: {mae:.3f}") # → 0.71 on a 10-point scale
Not bad for a first pass.
The Surprising Findings
Running SHAP values on the model surfaced some non-obvious correlations:
import shap
explainer = shap.Explainer(gbr, X_train)
shap_values = explainer(X_test)
shap.plots.beeswarm(shap_values)
What mattered most (in order):
-
gad7_score×animal_energy_levelinteraction — high anxiety patients saw dramatically worse outcomes with high-energy dogs, but better outcomes with calm cats or rabbits -
handler_experience_years— more than any animal feature, an experienced handler improved outcomes -
patient_age— older patients (65+) responded ~40% better to dogs vs. cats, opposite of what the coordinators expected
What didn't matter as much as expected:
- Breed (almost irrelevant once energy level was controlled)
- Session length (plateau effect after 20 min)
The FastAPI Wrapper
I turned this into a simple matching API:
from fastapi import FastAPI
from pydantic import BaseModel
import numpy as np
app = FastAPI()
class PatientProfile(BaseModel):
age: int
gad7_score: float
anxiety_baseline: float
session_goal: str # future: encode this too
class MatchResult(BaseModel):
recommended_animal_type: str
predicted_mood_delta: float
confidence: str
@app.post("/match", response_model=MatchResult)
async def match_patient(profile: PatientProfile):
# Simplified — real version queries animal availability DB
features = np.array([[
profile.age,
profile.gad7_score,
profile.anxiety_baseline,
2.5, # median energy
3.0, # median temperament
5.0 # median handler experience
]])
prediction = gbr.predict(features)[0]
# High anxiety → low energy animal
if profile.gad7_score > 15:
animal = "cat or rabbit"
else:
animal = "dog"
return MatchResult(
recommended_animal_type=animal,
predicted_mood_delta=round(prediction, 2),
confidence="medium" if prediction > 1.5 else "low"
)
What This Actually Changed
We ran the model in advisory mode (coordinators still made final calls, but saw the model's recommendation) for 3 months.
Results:
- Average mood delta improved from +1.8 to +2.4 (+33%)
- "No improvement" sessions dropped from 18% to 11%
- Coordinator time-per-match dropped ~40%
The most meaningful feedback came from a coordinator who said: "It's not replacing my judgment. It's catching the things I'd miss at the end of a long day."
What's Next
I'm exploring:
- Computer vision integration: using pose estimation to detect patient engagement levels in real time during sessions
- Longitudinal modeling: tracking mood trajectory over 6+ sessions, not just single-session deltas
- Handler burnout prediction: handlers are the #1 bottleneck — can we predict who needs a break before they quit?
If you're working in mental health tech, digital therapeutics, or just have experience with imbalanced regression targets in healthcare data — I'd love to connect.
This work feeds into MyPetTherapist — a platform connecting people with certified therapy animal programs. If you or someone you know could benefit, check it out.
Top comments (0)