DEV Community

Ayush Pathak
Ayush Pathak

Posted on

Building a Multi-Agent Ghost Story: How Kiro’s Hybrid Development Changed Everything

5 AI agents debating in 2-3 seconds. Built in days, not weeks. Here's how.


TL;DR

I built a gothic ghost story with 5 independent AI agents that debate in real-time using Kiro IDE. The twist? Each agent was built using a DIFFERENT development approach (vibe coding, spec-driven, steering docs), yet they work together as one coherent family.

This is the Frankenstein magic: incompatible paradigms stitched together into something unexpectedly powerful.

Tech Stack: Next.js, Groq (llama-3.3-70b), Azure TTS, Kiro IDE

⚡ Live Demo: https://midnightatthevossmanor.vercel.app/

📦 Source Code: https://github.com/AyushPathak2610/Midnight-at-the-Voss-Manor

🎮 Try it: Click "Click to begin" and watch 5 AI personalities debate your choices


"The agents genuinely disagree, creating emergent storytelling that's never the same twice."


The Problem: Building Consistent AI Personalities at Scale

When you're building a game with 5 AI agents that need to:

  • Maintain distinct personalities
  • Debate with each other in real-time
  • Stay consistent across 50+ generations
  • Create authentic conflict (not forced agreement)

...traditional AI development becomes a nightmare of prompt engineering, context management, and constant tweaking.

I needed a better way.


Enter Kiro: The Frankenstein Approach

Kiro IDE introduced me to something revolutionary: you don't need to pick ONE development paradigm. You can use different approaches for different problems and let Kiro stitch them together.

Here's how I built each ghost character:

1. Vibe Coding: Elara (The Mother)

The Conversation:

Me: "I need a maternal ghost character"
Kiro: "What's her role?"
Me: "She's the mother. Gentle, prioritizes family harmony"
Kiro: [Generates initial personality]
Me: "More poetic. Less formal. Under 30 words per response."
Kiro: [Refines to final version]
Enter fullscreen mode Exit fullscreen mode

Result: 5 minutes to a fully-formed character with emotional depth.

Why Vibe Coding Worked:

  • Fast iteration on "feeling"
  • Hard to spec "maternal warmth" formally
  • Natural language captures nuance better than technical specs

2. Spec-Driven: Harlan (The Scientist)

The Spec:

harlan: {
  name: 'Harlan',
  personality: 'scientific, amnesiac, logical but confused',
  systemPrompt: `You are Dr. Harlan Voss, the scientist ghost 
  with fragmented memories. You analyze problems logically but 
  struggle to remember emotional context. When debating, you 
  cite facts but defer to family on emotional matters. Keep 
  responses under 30 words. Use technical language mixed with 
  uncertainty.`
}
Enter fullscreen mode Exit fullscreen mode

Why Spec-Driven Worked:

  • Rock-solid consistency
  • Debuggable ("Line 47 violates spec")
  • Perfect for logical, technical characters

3. Steering Documents: The Family Dynamic

File: .kiro/steering/ghost-agent-rules.md

## Inter-Agent Debate Protocol

- Each agent MUST respond independently
- Agents can disagree - conflict is good for drama
- Mira often sides with emotional choices
- Harlan provides logical analysis but defers to family
- Selene demands honesty, Theo seeks forgiveness
Enter fullscreen mode Exit fullscreen mode

Why Steering Worked:

  • Prevented personality mix-ups across 50+ generations
  • Defined relationships BETWEEN agents, not just individual traits
  • Created authentic conflict instead of forced agreement

4. MCP Server: Theo's Eternal Vow Verification

This is where it gets really interesting. We built a custom Model Context Protocol (MCP) server that acts as a blockchain-style ledger for eternal vows.

The Concept: In the Hallway scene, players can verify if Theo kept his promise to return to Selene. The MCP server stores these vows and returns verification results.

File: app/api/mcp/vows/route.ts

// Simple in-memory vow ledger (simulates blockchain MCP)
const vows = new Map([
  ['theo-marry-selene', { 
    person: 'Theo', 
    vow: 'Marry Selene', 
    kept: false, 
    timestamp: '2039-01-15' 
  }],
  ['theo-return', { 
    person: 'Theo', 
    vow: 'Return to make amends', 
    kept: true, 
    timestamp: '2039-06-20' 
  }],
])

export async function POST(req: NextRequest) {
  const { action, person, vow } = await req.json()

  if (action === 'check') {
    const key = `${person.toLowerCase()}-${vow.toLowerCase().replace(/\s+/g, '-')}`
    const record = vows.get(key)

    if (record) {
      return NextResponse.json({
        found: true,
        ...record,
        message: record.kept 
          ? `✓ Vow kept: ${record.person} did ${record.vow} on ${record.timestamp}`
          : `✗ Vow broken: ${record.person} failed to ${record.vow}`
      })
    }

    return NextResponse.json({ 
      found: false, 
      message: 'No record of this vow in the ledger' 
    })
  }
}
Enter fullscreen mode Exit fullscreen mode

In-Game Integration (HallwayScene.tsx):

const handleCheckVow = async () => {
  try {
    const response = await fetch('/api/mcp/vows', {
      method: 'POST',
      headers: { 'Content-Type': 'application/json' },
      body: JSON.stringify({ 
        action: 'check', 
        person: 'Theo', 
        vow: 'Return' 
      })
    })

    const result = await response.json()
    setVowResult(result.message)

    // Selene speaks the eternal record
    speechService.speak(result.message, 'selene')
  } catch (error) {
    console.error('Failed to check vow:', error)
  }
}
Enter fullscreen mode Exit fullscreen mode

The Player Experience:

When players click "Check Theo's Vow" in the Hallway scene, the API checks the ledger. Since the game sends vow: 'Return' but the ledger has 'Return to make amends', it returns:

The Eternal Record:
"No record of this vow in the ledger"
Enter fullscreen mode Exit fullscreen mode

Note: The MCP server (mcp-servers/blockchain-vows-server.js) contains the full 4-vow ledger used during development, while the runtime API has a simplified 2-vow version. This demonstrates the separation between development tools and production features.

Why MCP Was Perfect Here:

  • Development-time testing: We could test vow verification directly in Kiro IDE without deploying
  • Separation of concerns: The vow logic lives in a dedicated server, not cluttering game code
  • Extensibility: Easy to add more vows, more characters, more verification types
  • Narrative depth: Players discover story through interaction, not exposition

The Meta Layer: During development, Kiro's agent could call the MCP server to verify vows while helping us write the narrative. The same tool that helps us BUILD the game is also PART of the game.



The Magic Moment: Real-Time Debates

💡 Pro Tip: This is where the Frankenstein magic happens - 5 independent systems working as one.

When you click "Ask Ghosts for Hint" during a puzzle, here's what happens:


// Invoke all 5 agents in parallel
const debatePromises = Object.keys(GHOST_AGENTS).map(async (ghostName) => {
  const response = await invokeGhostAgent(ghostName, context, apiKey)
  return { ghost: ghostName, message: response }
})

const debate = await Promise.all(debatePromises)
Enter fullscreen mode Exit fullscreen mode

Example Debate:

Player: "I need help with the tapestry puzzle"

Elara: "Focus on love and emotional memories, dear one."
Harlan: "I... categories. Logic. But family transcends data."
Mira: "The happy ones! When we played together!"
Theo: "Your family moments define you, brother."
Selene: "Truth matters. Match honestly, not hopefully."

Consensus (Elara): "Look for the emotional connections in 
each photo—love, joy, and family bonds will guide you."
Enter fullscreen mode Exit fullscreen mode

This is never the same twice. The agents genuinely disagree, creating emergent storytelling.


What Changed My Development Workflow

Before Kiro:

  1. Write prompt in ChatGPT
  2. Copy code to VS Code
  3. Test, find issues
  4. Go back to ChatGPT
  5. Lose context
  6. Repeat 10 times

Time per character: ~2 hours

With Kiro:

  1. Natural conversation: "Make Elara more poetic"
  2. Kiro refines in real-time
  3. Code is already in my project
  4. Context is preserved
  5. Done in 5 minutes

Time per character: ~5 minutes

"No context switching. No manual prompt engineering. Just natural conversation → production code."

Result: Built 5 distinct AI personalities in under an hour instead of 10+ hours.


The Frankenstein Chimera: Why Incompatible Parts Work

![Frankenstein Metaphor - Add image showing different development approaches]

Each agent was built DIFFERENTLY:

  • 🎨 Vibe-coded Elara (fluid, emotional)
  • 📋 Spec-driven Harlan (rigid, logical)
  • 📖 Steering-enforced Mira (rule-based simplicity)

They shouldn't work together. But they do.

"Steering documents act as the 'stitches' that hold the chimera together. They prevent chaos while allowing authentic conflict."

Why it works: You can SEE the seams (agents disagree), but they form something greater than the sum of their parts.

The lesson: Don't hide your seams - embrace them. Authentic conflict makes better AI.


The Complete AI Stack

This project showcases AI across every modality:

  • AI Agents: Groq (llama-3.3-70b) - 5 debating personalities
  • Voice Acting: Azure TTS - 6 unique neural voices
  • Scene Images: Google Gemini - 30 gothic-cyberpunk scenes
  • Background Music: Suno AI - 4 atmospheric scores
  • MCP Servers: Custom blockchain vow verification, memory persistence
  • Development: Kiro IDE - vibe + spec + steering + MCP integration

This is the ultimate Frankenstein: Not just stitching together code paradigms, but stitching together ENTIRE AI SYSTEMS across different modalities to create one cohesive experience.


📊 By The Numbers

  • 5 AI agents debating simultaneously
  • 2-3 seconds for complete debate (thanks to Groq's speed)
  • 4 TTS providers with automatic fallbacks
  • 30+ AI-generated scene images
  • 95% reduction in off-brand responses (steering docs)

Key Takeaways for Developers

1. Don't Pick One Paradigm - Use Them All

  • 🎨 Vibe coding for creativity and emotion
  • 📋 Spec-driven for logic and consistency
  • 📖 Steering docs for relationships and rules

2. Steering Documents Are Underrated

They prevented 95% of "off-brand" responses. Define relationships BETWEEN agents, not just individual traits.

"Define relationships BETWEEN agents, not just individual traits. That's where the magic happens."

3. Real-Time Multi-Agent Systems Are Possible

5 parallel API calls to Groq, independent responses, real-time synthesis. It works!

4. Context Preservation Changes Everything

No more copy-pasting between tools. Kiro keeps context across the entire development session.

5. Embrace the Seams

The "incompatible parts" (vibe vs spec vs steering) create authentic conflict. That's a FEATURE, not a bug.


🎯 Key Takeaways (Copy These!)

  1. Don't pick one paradigm - Mix vibe coding (creativity) + spec-driven (consistency) + steering docs (relationships)
  2. Steering documents prevent 95% of off-brand responses - Define relationships BETWEEN agents, not just traits
  3. Real-time multi-agent systems work - 5 parallel Groq calls in 2-3 seconds
  4. Context preservation is everything - No more copy-pasting between tools
  5. Embrace the seams - Incompatible parts create authentic conflict (that's a feature!)

Try It Yourself

git clone [your-repo]
npm install
cp .env.example .env
# Add GROQ_API_KEY to .env (free at console.groq.com)
npm run dev
Enter fullscreen mode Exit fullscreen mode

Then: Click "Ask Ghosts for Hint" and watch 5 AI agents debate in real-time.

Challenge: Can you add a 6th ghost? Share your results in the comments! 👇


The Future: Multi-Modal AI Development

This project proves that we can stitch together:

  • Different AI systems (Groq, Azure, Gemini, Suno)
  • Different development paradigms (vibe, spec, steering)
  • Different modalities (text, speech, image, audio)

...into one coherent experience.

That's the Frankenstein vision: Not just building with AI, but building WITH multiple AIs, each specialized for their domain, all orchestrated by Kiro.


📚 Resources & Links

  • 🎮 Play the Game: [Your deployed URL]
  • 💻 Source Code: [Your GitHub URL]
  • 🛠️ Kiro IDE: kiro.dev
  • ⚡ Groq API: console.groq.com (Free tier available!)

💬 Let's Discuss!

I want to hear from you:

  1. Have you tried multi-agent systems? What challenges did you face?
  2. Which development paradigm do you prefer: vibe coding, spec-driven, or steering?
  3. What would YOU add as a 6th ghost character?

Drop your thoughts in the comments! I respond to everyone. 👇


🙏 If You Found This Helpful

  • ❤️ Give it a heart
  • 🔖 Bookmark for later
  • 🔄 Share with your dev friends
  • 👤 Follow me for more AI development insights

Built for the #Kiroween Frankenstein Hackathon 2024

Tags: #kiro #AI #GameDev #MultiAgent #Groq #NextJS #LLM #TTS #AgentDriven #Frankenstein


🎃 About the Hackathon

This project was built for Kiro's Frankenstein category: "Stitch together a chimera of technologies into one app." We combined:

  • 5 LLM agents (Groq)
  • 4 TTS providers (Azure, Google, Play.ht, Gemini)
  • AI-generated art (Gemini)
  • AI-generated music (Suno)
  • Custom MCP servers
  • 3 development paradigms

The result? Something unexpectedly powerful - just like Frankenstein's monster.


Appendix: Technical Deep Dive

Agent Architecture

Each agent has:

  1. Name - Display name
  2. Personality - One-line description
  3. System Prompt - Detailed behavior definition (under 30 words per response)
  4. Voice - Unique Azure TTS neural voice

Debate Flow

1. Player triggers hint request
2. API route receives puzzle context
3. 5 parallel Groq API calls (Promise.all)
4. Each agent responds independently (temperature=0.8)
5. Elara synthesizes consensus from all perspectives
6. Responses spoken with unique voices (Azure TTS)
7. Display in real-time to player
Enter fullscreen mode Exit fullscreen mode

Steering Document Strategy

Key insight: Define relationships, not just traits.

Bad steering:

Elara is maternal
Harlan is logical
Enter fullscreen mode Exit fullscreen mode

Good steering:

When debating:
- Mira sides with emotional choices
- Harlan provides logic but defers to family
- Selene demands honesty
- Elara synthesizes consensus
Enter fullscreen mode Exit fullscreen mode

The second approach creates authentic family dynamics.


Performance Notes

  • 5 parallel API calls: ~2-3 seconds total (Groq is FAST)
  • Azure TTS caching: Repeated phrases cached locally
  • Music streaming: 15% volume, seamless loops
  • Image optimization: Next.js Image component with priority loading

Future Improvements

  1. Memory system: Agents remember previous conversations
  2. Emotional state: Agents' moods change based on player choices
  3. Branching narrative: Multiple endings based on agent consensus
  4. Voice cloning: Custom voices for each character
  5. Real-time animation: Lip-sync with TTS output

Thanks for reading! If you found this helpful, give it a ❤️ and follow me for more AI development insights.

Top comments (0)