How I Built an AI Pet Therapist Matching Engine (With Python + Embeddings)

#ai #python #showdev #webdev

We've all seen apps that recommend movies or music based on your mood. But what about pets?

A few months ago I started asking: what if we could use AI to match people with the right kind of emotional support animal based on their lifestyle, mental health needs, and living situation?

The result is MyPetTherapist — and this post breaks down exactly how I built the matching engine.

The Problem With Generic Pet Advice

Most pet recommendation tools are basically fancy quizzes:

Do you have a yard? → Get a dog
Live in an apartment? → Get a cat
Busy lifestyle? → Get a fish

That's surface-level at best. Emotional support animal matching is a different problem entirely. Someone with anxiety doesn't just need a pet — they need the right temperament, energy level, and bonding style for their specific situation.

This is where LLMs genuinely shine.

Architecture Overview

User Input → Embedding Layer → Retrieval (pet profiles DB) → LLM Reasoning → Match Score → Recommendation

Three main components:

Structured intake form (lifestyle + mental health markers)
Vector-embedded pet trait profiles
GPT-4o reasoning layer for nuanced matching

Step 1: Encoding Pet Traits as Vectors

I built a dataset of ~200 dog breeds, 30 cat types, and a handful of exotic pets — each with 40+ trait dimensions:

import openai
import json

client = openai.OpenAI()

def embed_pet_profile(profile: dict) -> list[float]:
    """
    Convert a structured pet profile into a semantic embedding.
    We serialize the traits as natural language first — 
    embeddings work better on prose than raw JSON.
    """
    prose = f"""
    Breed: {profile['breed']}.
    Energy level: {profile['energy']}/10.
    Affection style: {profile['affection_style']}.
    Good with anxiety: {profile['anxiety_friendly']}.
    Noise level: {profile['noise']}/10.
    Independence score: {profile['independence']}/10.
    Ideal owner: {profile['ideal_owner_description']}
    """

    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=prose.strip()
    )
    return response.data[0].embedding

# Example profile
golden = {
    "breed": "Golden Retriever",
    "energy": 7,
    "affection_style": "constant, tactile, unconditional",
    "anxiety_friendly": True,
    "noise": 5,
    "independence": 3,
    "ideal_owner_description": "Active person who works from home or has flexible schedule, benefits from structure and physical affection"
}

vector = embed_pet_profile(golden)
print(f"Vector dims: {len(vector)}")  # 1536

All vectors stored in Supabase pgvector — dead simple, no separate vector DB infra needed.

Step 2: Translating User State Into a Query Vector

The intake form captures:

Living situation (apartment / house / rural)
Activity level (sedentary → very active)
Work schedule (WFH / office / shift work)
Mental health focus: anxiety / depression / loneliness / PTSD / ADHD
Allergies, budget, prior pet experience

We convert this into the same embedding space:

def build_user_query(form_data: dict) -> str:
    """Turn form responses into a semantic query."""

    focus_map = {
        "anxiety": "needs calming presence, predictable behavior, gentle touch",
        "depression": "needs motivation to get up, outdoor companion, unconditional positive regard",
        "loneliness": "needs constant companionship, social bridge with other humans",
        "ptsd": "needs non-startling, reads emotional cues, patient bonding",
        "adhd": "needs engagement without overwhelm, interactive but not chaotic"
    }

    focus_desc = focus_map.get(form_data["primary_need"], "general companionship")

    query = f"""
    Looking for a pet that suits someone who {form_data['activity_level']} activity level,
    lives in a {form_data['living_situation']}, works {form_data['work_schedule']}.
    Primary therapeutic need: {focus_desc}.
    Experience level: {form_data['experience']}. Budget: {form_data['budget']}.
    """

    response = client.embeddings.create(
        model="text-embedding-3-small",
        input=query.strip()
    )
    return response.data[0].embedding

Step 3: Cosine Retrieval + LLM Re-ranking

The vector search gives us top-15 candidates. Then GPT-4o re-ranks them with reasoning:

def rerank_matches(user_profile: dict, candidates: list[dict]) -> list[dict]:
    """
    Use LLM to apply nuanced logic that pure vector similarity misses.
    E.g.: a high-energy dog might score high on 'depression' vector
    but be wrong for someone who is physically limited.
    """

    system_prompt = """
    You are an expert in animal-assisted therapy and human-pet bonding.
    Given a user's profile and a list of candidate pets, re-rank the candidates
    from best to worst match. Consider contraindications carefully.
    Return JSON array with 'rank', 'breed', 'score' (0-100), and 'reason'.
    """

    user_message = f"""
    User profile: {json.dumps(user_profile, indent=2)}

    Candidates: {json.dumps(candidates, indent=2)}
    """

    response = client.chat.completions.create(
        model="gpt-4o",
        response_format={"type": "json_object"},
        messages=[
            {"role": "system", "content": system_prompt},
            {"role": "user", "content": user_message}
        ]
    )

    return json.loads(response.choices[0].message.content)["rankings"]

This two-stage approach (fast vector retrieval → slow but smart LLM re-ranking) keeps costs low while maintaining quality.

What Surprised Me

Cats consistently outperform dogs for PTSD profiles. The independent, non-startling, schedule-setting nature of cats maps unexpectedly well onto trauma recovery — something a pure quiz would never surface.

Rabbits are underrated. Technically ideal for apartment-dwellers with anxiety who want tactile comfort without the commitment of a dog. The model keeps recommending them; users are skeptical at first, then come back months later saying it was right.

The "experience" field is critical. A first-time pet owner matched with a Malinois for their high-energy depression profile is a disaster waiting to happen. The LLM re-ranker catches these edge cases where vector similarity alone fails.

Results So Far

~4,200 matches generated since soft launch
68% of users said their match felt "surprisingly accurate"
12% found it life-changing (direct quotes in our reviews 🥹)
Average session: 6 minutes from form start to recommendation

What's Next

Fine-tuning on real adoption outcome data (partnering with shelters)
Multimodal: let users upload a photo of their living space
API for veterinary practices and therapists to use in clinical intake

If you're curious about the emotional-support angle or want to try the matching tool yourself, check out mypettherapist.com — free to use, no account required.

What's your take on using LLMs for matching problems like this? I'd love to hear how others are handling the "nuanced reasoning" gap that pure embeddings leave. Drop a comment below. 👇