DEV Community

Cover image for elsewhere, a text-to-3D studio
Bunny Sleuth
Bunny Sleuth

Posted on

elsewhere, a text-to-3D studio

Built with Google Gemini: Writing Challenge

This is a submission for the Built with Google Gemini: Writing Challenge

What I Built with Google Gemini

I built a high performance text-to-3D model studio that works straight from the browser! A user describes what they want in natural language from “cute cat” to “floating pizza with laser eyes” and Gemini generates an interactive 3D model (in THREE.js).
Asset generation is a two-phase pipeline with:
## Planning phase — Gemini receives the user prompt + PLANNING_SYSTEM_PROMPT_V4 as systemInstruction. Temperature 0.5, thinkingLevel: 'low', max 8192 output tokens. It returns a v3 schema JSON: an array of 3-6 materials (color, roughness, metalness) and 4-12 parts, each specifying geometry type (Box|Sphere|Cylinder|Cone|Torus|Lathe|Tube|Dome), parent reference, priority (1-3), material index, geometry parameters, and instance transforms (position/rotation/scale arrays). The LLM never writes executable code and rather it describes geometry in a constrained JSON vocabulary.
## Compilation phase — SchemaCompiler.compile() runs five deterministic steps with no LLM involvement:
- Parse: normalize JSON, expand defaults, resolve material references
- Validate: check required fields, parent references, topological sort
- Budget: prune parts by priority (3 → 2 → 1.5) if mesh count exceeds 24 or the material count exceeds 5
- Auto-snap: detect disconnected parts and snap to parent bounding box (threshold: 2.0 units)
- Emit: generate Three.js code — MeshStandardMaterial array, geometry constructors, parent-child hierarchy via group.add()

We can even handle full scene generation from a single prompt or theme! This same spatial reasoning when combining asset parts is used to place them on the map. After each round, a screenshot of the studio is taken from multiple angles and these images are passed back to Gemini where it tweaks the coordinates and relations, so assets fit together more tidily.

Demo

Here is my Cloud run link (the password is "buildwithelsewhere") :


Here is my quick little YouTube demo/trailer:

And here is the GitHub repo:

GitHub logo bug39 / elsewhere

3D world-building studio powered by Gemini. Generate assets from text, build worlds, create animations.

elsewhere

AI-Powered 3D World Studio

Describe what you want and AI builds it — 3D assets, entire scenes, living worlds you can explore in third person. Built for Google's Gemini 3 Hackathon

What You Can Do

  • Generate assets from text prompts — "a cozy cabin with smoke from the chimney"
  • Generate entire scenes — "a medieval village marketplace" plans, creates, and places everything
  • Arrange worlds on a 400m terrain with biomes, heightmaps, and textures
  • Script NPCs with behaviors and branching dialogue trees
  • Play your world in third person — walk, run, jump, talk to NPCs

Quick Start

npm install
npm run dev        # http://localhost:3000
Enter fullscreen mode Exit fullscreen mode

If the hackathon results still aren't out, this might not be working yet! Requires a Gemini API key (free tier works).

Tech Stack

Preact, Three.js, Gemini 3 Flash, React Flow, IndexedDB

License

MIT




:D

What I Learned

Technical skills, soft skills, unexpected lessons — what stuck with you? →
I had LOTS of trouble with prompt engineering the generation model to output consistent results across a very wide diversity of prompts. It took maybe 30 iterations to get right. At one point, I had a CLI agent set-up a mock studio where the base prompt had 15+ other tiny tweaks, and I judged the quality of each output, slowly picking out weakness of each prompt I got to a place where I was very happy with how consistently the model could output differing geometries. I wanted to know how to very finely tweak these 3D models and I couldn’t really rely on communicating that through text with myagent, so I had to get pretty familiar with THREE.js!

Google Gemini Feedback

When Gemini 3 Flash Preview was in the pipeline, I was really missing that final "push" to get more detail out of the THREE.js compiler I made. The recently released Gemini-3.1-flash-preview brought with it a HUGE improvement in spatial reasoning, and this was exactly what elsewhere needed (though in the Cloud run link, gemini-3-flash-preview is still running as I can't afford to pay for that on a public link!). Very smooth and easy experience using Gemini. I had originally started this for a Gemini hackathon, so I was locked into that, but in early testing, I found that Flash performed better, faster, and cheaper for my specific task of generating 3D models.

ai #computer #gemini

Top comments (1)

Collapse
 
rawr profile image
Bunny Sleuth

WOW :D