DEV Community

Cover image for Forge Master: Text-to-3D on Cloud Run GPU
Ketan Saraf
Ketan Saraf

Posted on

Forge Master: Text-to-3D on Cloud Run GPU

What We Built and Learned

TL;DR We built Forge Master, a small web app that turns short text prompts into production-ready 3D assets in about ninety seconds. It runs on Google Cloud Run with an NVIDIA L4 GPU, uses Imagen for crisp reference images, InstantMesh (Zero123++ + LRM) for reconstruction, and Gemini for prompt enhancement and QA.
This post is published for the purposes of entering the Cloud Run Hackathon.


Demo video


Why we chose Cloud Run GPU

We wanted to run real 3D reconstruction without managing clusters. Cloud Run GPU let us ship a container with PyTorch, Diffusers, and InstantMesh, scale from zero, and keep costs bounded. Pairing a public frontend with an IAM-protected GPU backend made it safe to open the demo to everyone.


Architecture at a glance

  • Frontend (Cloud Run): Next.js UI + React Three Fiber viewer
  • Agent service (Cloud Run): four agents for prompt enhancement, generation coordination, quality assurance, and iterative improvement
  • GPU service (Cloud Run GPU L4): Imagen → Zero123++ → LRM → post-process → export (GLB, OBJ, FBX, STL)
  • Cloud Storage: serves model files and multi-view images
  • Cloud Run IAM: only the Agent can call the GPU service

Forge Master architecture


The weekend build: three things that worked

  1. Small, explicit contracts between services.
    Frontend calls POST /generate on the Agent; the Agent obtains an IAM token, invokes the GPU service, collects URLs and stats, runs QA, and returns a compact result. Clear payloads made debugging easy.

  2. Reconstruction-friendly inputs.
    We bias prompts toward centered objects, white background, and studio lighting. That single nudge reduced artifacts and improved mesh quality in Zero123++ and LRM.

  3. A short, bounded “quality loop.”
    We compute mesh stats (vertices, faces, watertightness) and have Gemini score semantic fidelity. If the score is low, we re-run once with adjusted guidance. Tight loops beat unbounded retries.


What we learned about running our own models on Cloud Run GPU

  • VRAM spikes happen. Zero123++ and LRM can spike memory. Fixing batch sizes, capping resolution, using FP16, and adding timeouts kept the service stable on a single L4.
  • Cold starts are manageable. Trimming model load and keeping post-processing minimal kept p95 under ~140s, with typical runs around ~90s.
  • Secure the expensive hop. The GPU path is IAM-gated. Browsers never call it directly, which kept the demo public without risking runaway costs.

Try it

Prompt idea: “A medieval fantasy sword with ornate handle, centered, white background, studio lighting.”
Inspect the mesh in the viewer and download GLB, OBJ, FBX, or STL.

This article is public and was created for the purposes of entering the Cloud Run Hackathon.

Top comments (0)