I Let Toy Story Characters Give Real Life Advice Using Gemini. Here's What Broke (And What Surprised Me)

#devchallenge #geminireflections #gemini

Built with Google Gemini: Writing Challenge

This is a submission for the Built with Google Gemini: Writing Challenge

There's a moment in every hackathon where your idea either becomes embarrassing or brilliant, and you don't find out which until 3am.

Ours was: What if Woody, Buzz, and Mr. Potato Head gave you actual life advice, powered by Gemini AI?

That became Yap & Yap. Here's the story.

What I Built with Google Gemini

Yap & Yap is an interactive advice platform where nine iconic Toy Story characters respond to your real questions, each one in their own voice, with their own personality, powered by Google Gemini.

You type a question (anything from "should I quit my job?" to "how do I tell my roommate their cooking smells"), select which characters you want to hear from, and get back nine wildly different takes:

🤠 Woody — loyal, morally grounded, slightly too sincere
🚀 Buzz — heroic, overconfident, solutions that involve space
🦖 Rex — spirals into the worst-case scenario immediately
🥔 Mr. Potato Head — zero filter, will tell you the truth
🧸 Lotso — warm, helpful, and something feels off
🐷 Hamm — cold cost-benefit analysis, no emotional labor included

After getting all responses, you can click into individual characters for follow-up one-on-ones. Finish the session and you get a "Yapster Certificate", a small celebration of the chaos you just created.

The stack: React + Vite (frontend), Node.js (backend), Tailwind CSS, Google Gemini API for all character responses, deployed on Render.

Gemini's role wasn't just "answer questions." It was carrying nine distinct personalities simultaneously, staying in character across follow-up turns, and making each character feel genuinely different, not just a tone variation on the same base model output.

The Real Engineering Problem: Personality at Scale

The hard part wasn't calling the Gemini API. It was keeping nine characters consistently themselves across every possible question a user could throw at them.

Early in the hackathon, our characters started blending together. Woody gave practical advice. Buzz gave practical advice. Mr. Potato Head gave slightly blunter practical advice.

The issue: our system prompts were describing characters instead of being them.

Before:
You are Mr. Potato Head. He is sarcastic and brutally honest.

After:
You are Mr. Potato Head. You have a face that comes apart and you've seen things.
You have no patience for people who can't handle the obvious truth.
You don't comfort, you clarify. Every answer should feel like a slap the person secretly needed.

That shift from character sheet to voice and worldview immediately changed the output quality. Prompting is design work, not configuration.

Demo

🔗 Live app: yapandyap.onrender.com
💻 GitHub: https://github.com/moeezs/yapandyap
🎥 Demo https://youtu.be/4SmDS0n6Go0

Ask a question → pick your toys → get chaotic, character-authentic advice → receive your Yapster Certificate.

What I Learned

Prompting is design work. I walked into this hackathon treating prompts like config files. I left treating them like UI copy, something you iterate on, user-test, and refine until the experience clicks. The gap between a mediocre character and a great one lived entirely in how we framed the prompt, not in the model.

Constraints unlock creativity. Working within an existing IP forced us to solve problems we wouldn't have found otherwise. You can't make Woody "edgier" to make him interesting, you have to find what's already compelling about his specific brand of loyalty and moral seriousness. That constraint pushed harder thinking.

Joy is a real metric. The "serious" hackathon projects were technically impressive. But nobody was crowded around them at demo time. People were crowded around ours, asking Mr. Potato Head for relationship advice and screenshotting their certificates. Engagement and delight are valid engineering goals.

Character consistency compounds. The characters that felt most alive weren't just well-prompted on their own, they felt different from each other. Gemini's ability to hold contrasting tones simultaneously (Lotso's warmth vs. Hamm's coldness in the same session) made the whole thing work.

Google Gemini Feedback

What worked incredibly well:

Tonal range. Once we cracked the prompt framing, Gemini held each character's emotional register with surprising consistency. Lotso maintained that unsettling warmth. Jessie spiraled emotionally in ways that felt genuinely impulsive. The model found each character's center of gravity and stayed there across multi-turn conversations.

Contextual memory within sessions. In follow-up chats, Gemini would reference what the character had already said. Buzz would double down on his previous space-based solution. Woody would express concern about what Buzz suggested. We hadn't engineered for this, it emerged from the conversation history naturally.

Where we hit friction:

Lotso was a nightmare. His character is: sounds sweet, actually manipulative. Getting Gemini to consistently toe that line, helpful enough to seem supportive, subtly off in ways a careful reader would catch, took the most prompt iteration of any character. The model kept defaulting to either fully warm or cartoonishly villainous. Real nuance required real work.

The "helpful AI override." A few times, Gemini would break character mid-response with something like "as an AI, I want to note..." — which completely shattered the illusion. We fixed this by explicitly framing in the prompt that staying in character is the help, and that breaking character defeats the purpose. It mostly resolved after that, but it required deliberate attention.

Response length inconsistency. Rex and Jessie would give sprawling emotional walls of text. Hamm and Mr. Potato Head would respond in two sentences. Personality-appropriate, but visually chaotic when nine cards loaded together. Light per-character length guidance in the prompt smoothed this out.

What's Next

The version I want to build has Woody and Buzz arguing with each other about your question in real time, a multi-agent conversation routed through Gemini where you moderate instead of just receive. That's a different architecture challenge but it's the natural evolution of what we started.

I also want to explore Gemini's multimodal capabilities here. Imagine uploading a photo of a situation and letting the toys react to what they see. That feels very on-brand.

To infinity and beyond, or at least to the next hackathon. 🚀