DEV Community

Quentin Merle
Quentin Merle

Posted on

💎 GemMaster: Immersive Core RPG — Orchestrating Narrative Absurdity with Gemma 4

Gemma 4 Challenge: Build With Gemma 4 Submission

This is a submission for the Gemma 4 Challenge: Build with Gemma 4

What I Built

GemMaster is a specialized narrative engine designed to move beyond the traditional "chat window" paradigm. It transforms the classic text-adventure into a cinematic experience, bridging the digital and physical worlds through multimodal AI vision—all while running 100% locally on your machine via Ollama.

Welcome to GemMaster

🚀 The Vision: Testing the "Brain" of Local Models

I wanted to see what a local model really has in its belly (or its head!): could it handle being a rigid Game Logic Orchestrator while maintaining a cinematic soul? GemMaster proves that with rigorous "Engine-level" prompting and a clever frontend, a tiny model like Gemma 4-E4B can deliver a surprisingly deep and interactive experience directly on consumer hardware, with total privacy.

⚠️ IMPORTANT
Performance Note: I've optimized this engine specifically for Gemma 4 E4B and larger. Due to the high complexity of the multi-tag protocol, the E2B model may experience "formatting drift" in long sessions.

🏗️ Video Game Architecture: The Anatomy of a Turn

Unlike standard LLM chats, GemMaster treats every response as a Game Frame.

  • The Wizard’s Spark: Choices (Universe, Tone, Language) are converted into a dynamic JSON configuration injected into the base prompt.
  • Linguistic Sovereignty: Dynamic system reminders prevent "Language Drift," keeping narrations and tags perfectly aligned with the user's locale.
  • Continuity: A Session-Lock System and Markdown-based journal ensure long-term narrative consistency.

GemMaster Wizard


Demo

GemMaster features a Liquid Glass Design with hardware-accelerated CSS filters and dynamic "Ambilight" backgrounds that shift based on the story tone (Action = Red, Tension = Purple, Mystery = Blue).

Tactical Dice Roll result

👁️ Multimodal Immersion (Experimental)

  • 🎙️ Voice of the Director: Using the Web Speech API to read the AI's internal intentions through the <voiceover> tag.
  • 📸 Visual Portal: A high-tech laser-scanning interface for image analysis challenges, bridging the gap between the physical world and the narrative.

Multimodal Vision in action


Code

You can explore the source code and run the engine yourself here:
👉 GemMaster on GitHub

git clone https://github.com/QuentinMerle/gemmaster.git
cd gemmaster
./install.sh
python main.py
Enter fullscreen mode Exit fullscreen mode

How I Used Gemma 4

I chose Gemma 4-E4B for its perfect balance between reasoning capabilities and local performance.

🛠️ Taming a 4B Model: The "Mechanical Toolkit"

I gave Gemma a full toolbox of interactive skills. The model doesn't just write; it triggers specialized components:

  • 🎲 [[CHECK: Stat|DC]]: Triggers a deterministic 3D dice roll.
  • ⚡ [[SKILL: QTE]]: Triggers a physical Quick Time Event.
  • 👁️ [[SKILL: VISION]]: Triggers real-world image analysis.

Gemma 4 can trigger QTE

🎨 Creative Constraints: Freedom through Structure

By enforcing tags for mechanics, I free the model's "brain" from worrying about how to resolve actions. It just triggers the tag, and then focuses 100% of its attention on the quality of the prose.

🔍 Behind the Glass: The "Cheat" of Immersion

  • The "Silent Shepherd": Hidden rule reminders appended to every user message to prevent "Model Drift."
  • The Atomic Parser: A custom regex engine extracting tags from the stream in real-time.
  • Deterministic Resolution: Offloading game logic to the frontend using seeded randomness to ensure "fair" play.

Conclusion

💎 GemMaster proves that small local models like Gemma 4 are capable of high-fidelity, multimodal orchestration. I’ve tried to build more than just a game; I hope it serves as a modest exploration of what is now possible in the local AI era.

📝 A Final Note

While I’ve spent far more time on this than originally planned, this is still an experimental engine. There may be some bugs or narrative "glitches" along the way—I appreciate your indulgence, and most of all, I hope you enjoy the adventure!

Top comments (2)

Collapse
 
godaddy_llc_4e3a2f1804238 profile image
GoDaddy LLC

This is a really creative example of treating an LLM less like a chatbot and more like a real-time game orchestration engine 😄.
The structured tag system is especially smart because it separates narrative creativity from deterministic gameplay mechanics — which is honestly how many commercial game engines handle complexity too.
I also like the decision to keep everything local with Ollama; immersive AI experiences feel very different when latency and privacy concerns disappear.
The “cinematic frame” concept combined with multimodal triggers and dynamic UI effects makes this feel closer to an AI-powered RPG director than a traditional text adventure.
And the hidden “Silent Shepherd” reminders are basically the AI equivalent of a dungeon master quietly flipping through rulebooks behind the screen 😂.
The most impressive part is probably getting a relatively small local model to maintain narrative consistency under such a structured protocol. Really fascinating exploration of what local multimodal AI can become beyond simple chat interfaces.

Collapse
 
quentin_merle profile image
Quentin Merle

Thank you so much for the amazing feedback! It really means a lot to see that you captured the essence of the architecture and the intention behind every mechanic with such precision. This is exactly what I was hoping to convey with this project! 😄