What We Built and Learned
TL;DR We built Forge Master, a small web app that turns short text prompts into production-ready 3D assets in about ninety seconds. It runs on Google Cloud Run with an NVIDIA L4 GPU, uses Imagen for crisp reference images, InstantMesh (Zero123++ + LRM) for reconstruction, and Gemini for prompt enhancement and QA.
This post is published for the purposes of entering the Cloud Run Hackathon.
- Live demo: https://forge-master-frontend-525900378413.europe-west1.run.app
- GitHub repo: https://github.com/Keshraf/forge/
Demo video
Why we chose Cloud Run GPU
We wanted to run real 3D reconstruction without managing clusters. Cloud Run GPU let us ship a container with PyTorch, Diffusers, and InstantMesh, scale from zero, and keep costs bounded. Pairing a public frontend with an IAM-protected GPU backend made it safe to open the demo to everyone.
Architecture at a glance
- Frontend (Cloud Run): Next.js UI + React Three Fiber viewer
- Agent service (Cloud Run): four agents for prompt enhancement, generation coordination, quality assurance, and iterative improvement
- GPU service (Cloud Run GPU L4): Imagen → Zero123++ → LRM → post-process → export (GLB, OBJ, FBX, STL)
- Cloud Storage: serves model files and multi-view images
- Cloud Run IAM: only the Agent can call the GPU service
The weekend build: three things that worked
Small, explicit contracts between services.
Frontend callsPOST /generateon the Agent; the Agent obtains an IAM token, invokes the GPU service, collects URLs and stats, runs QA, and returns a compact result. Clear payloads made debugging easy.Reconstruction-friendly inputs.
We bias prompts toward centered objects, white background, and studio lighting. That single nudge reduced artifacts and improved mesh quality in Zero123++ and LRM.A short, bounded “quality loop.”
We compute mesh stats (vertices, faces, watertightness) and have Gemini score semantic fidelity. If the score is low, we re-run once with adjusted guidance. Tight loops beat unbounded retries.
What we learned about running our own models on Cloud Run GPU
- VRAM spikes happen. Zero123++ and LRM can spike memory. Fixing batch sizes, capping resolution, using FP16, and adding timeouts kept the service stable on a single L4.
- Cold starts are manageable. Trimming model load and keeping post-processing minimal kept p95 under ~140s, with typical runs around ~90s.
- Secure the expensive hop. The GPU path is IAM-gated. Browsers never call it directly, which kept the demo public without risking runaway costs.
Try it
Prompt idea: “A medieval fantasy sword with ornate handle, centered, white background, studio lighting.”
Inspect the mesh in the viewer and download GLB, OBJ, FBX, or STL.
- Live demo: https://forge-master-frontend-525900378413.europe-west1.run.app
- GitHub repo: https://github.com/Keshraf/forge/
This article is public and was created for the purposes of entering the Cloud Run Hackathon.

Top comments (0)