DEV Community

Rob Marcarelli
Rob Marcarelli

Posted on

Bulldog Behavior Interpreter

What I Built
I built the "Bulldog Behavior Interpreter," a specialized web application designed to help bulldog owners understand the unique and often perplexing behaviors of their beloved pets. Bulldogs have a distinct way of communicating through grunts, postures, and actions that can be misinterpreted. This applet bridges that communication gap.

Users can upload an image, a short video clip, or even an audio recording of their bulldog. For in-the-moment analysis, they can use the "Live Capture" feature to snap a photo directly from their device's camera. The app then uses the Gemini 2.5 Flash model to perform a sophisticated multimodal analysis, providing a simple, three-part breakdown:

Behavior: A concise name for the likely behavior (e.g., "Comfort Seeking," "Dominance Play").
Explanation: A plain-language description of what the behavior means.
Actionable Tip: A simple suggestion for how the owner can respond.
The app also includes a history of recent analyses and prominent disclaimers to ensure users consult veterinary professionals for any health concerns.

Demo
https://bulldog-behavior-interpreter-163162226436.us-west1.run.app

How I Used Google AI Studio
Google AI Studio was instrumental throughout the development process. I used it as my primary environment for prompt engineering and testing the multimodal capabilities of the Gemini 2.5 Flash model.

Specifically, I leveraged AI Studio to:

Craft the System Prompt: I developed and refined the core system instruction that primes the model to act as a bulldog behavior expert.
Define a Structured Output: A key feature of the app is its consistent, reliable output. I used the JSON mode in AI Studio to define and enforce a strict responseSchema. This ensures the model always returns the behavior, explanation, and tip fields, which I can then parse and display cleanly in the UI.
Test Multimodal Inputs: I tested various combinations of inputs directly in the studio—uploading images with text, videos, and audio clips—to see how the model would respond. This rapid iteration was crucial for building confidence in the analysis logic before writing a single line of frontend code.
The ability to quickly prototype and validate the core AI functionality in Google AI Studio saved significant development time and resulted in a more robust and predictable application.

Multimodal Features
The "Bulldog Behavior Interpreter" is fundamentally multimodal, combining different types of media to create a holistic and context-aware analysis that a single-mode model could not achieve.

Image + Video Analysis: The app analyzes visual data to interpret key bulldog behaviors. It can recognize subtle cues like posture (a low-slung head), facial expressions (a wrinkled snout), and actions (like "ear-sucking" or "head licking") from both static photos and video clips.
Audio Analysis: Bulldogs are very vocal. The app allows users to upload audio files to analyze sounds like whining, grunting, snoring, or specific types of barks. This adds a crucial layer of context that visuals alone might miss.
Textual Context: The user can add an optional text prompt to describe the situation. This allows the model to fuse the visual/auditory data with the owner's observation (e.g., Image of a bulldog by the door + "He's been making a whining sound"). This fusion leads to a much more accurate interpretation than either input could provide alone.
Live Camera Input: The "Live Capture" feature is a powerful multimodal interaction. It allows users to capture transient behaviors as they happen, removing the friction of having to record a video, save it, and then upload it. This real-time capability makes the tool immensely more practical for everyday use.
By combining these modalities, the app provides a rich, nuanced understanding of bulldog behavior, turning confusing actions into clear, actionable insights for the owner.

Top comments (0)