Forge Master: Text-to-3D on Cloud Run GPU

#google #serverless #showdev #ai

What We Built and Learned

TL;DR We built Forge Master, a small web app that turns short text prompts into production-ready 3D assets in about ninety seconds. It runs on Google Cloud Run with an NVIDIA L4 GPU, uses Imagen for crisp reference images, InstantMesh (Zero123++ + LRM) for reconstruction, and Gemini for prompt enhancement and QA.
This post is published for the purposes of entering the Cloud Run Hackathon.

Live demo: https://forge-master-frontend-525900378413.europe-west1.run.app
GitHub repo: https://github.com/Keshraf/forge/

Demo video

Why we chose Cloud Run GPU

We wanted to run real 3D reconstruction without managing clusters. Cloud Run GPU let us ship a container with PyTorch, Diffusers, and InstantMesh, scale from zero, and keep costs bounded. Pairing a public frontend with an IAM-protected GPU backend made it safe to open the demo to everyone.

Architecture at a glance

Frontend (Cloud Run): Next.js UI + React Three Fiber viewer
Agent service (Cloud Run): four agents for prompt enhancement, generation coordination, quality assurance, and iterative improvement
GPU service (Cloud Run GPU L4): Imagen → Zero123++ → LRM → post-process → export (GLB, OBJ, FBX, STL)
Cloud Storage: serves model files and multi-view images
Cloud Run IAM: only the Agent can call the GPU service

The weekend build: three things that worked

Small, explicit contracts between services.
Frontend calls POST /generate on the Agent; the Agent obtains an IAM token, invokes the GPU service, collects URLs and stats, runs QA, and returns a compact result. Clear payloads made debugging easy.
Reconstruction-friendly inputs.
We bias prompts toward centered objects, white background, and studio lighting. That single nudge reduced artifacts and improved mesh quality in Zero123++ and LRM.
A short, bounded “quality loop.”
We compute mesh stats (vertices, faces, watertightness) and have Gemini score semantic fidelity. If the score is low, we re-run once with adjusted guidance. Tight loops beat unbounded retries.

What we learned about running our own models on Cloud Run GPU

VRAM spikes happen. Zero123++ and LRM can spike memory. Fixing batch sizes, capping resolution, using FP16, and adding timeouts kept the service stable on a single L4.
Cold starts are manageable. Trimming model load and keeping post-processing minimal kept p95 under ~140s, with typical runs around ~90s.
Secure the expensive hop. The GPU path is IAM-gated. Browsers never call it directly, which kept the demo public without risking runaway costs.

Try it

Prompt idea: “A medieval fantasy sword with ornate handle, centered, white background, studio lighting.”
Inspect the mesh in the viewer and download GLB, OBJ, FBX, or STL.