How I Built an AI Design Platform That Renders Professional Architectural Visuals in Under 10 Seconds

#ai #interior #programming #productivity

The first time I showed a client a photorealistic render generated from their hand-drawn napkin sketch in under ten seconds, they thought I had a team of 3D artists on standby. I didn't. It was a single API call.

This post is about the technical decisions, the architecture choices, and the lessons learned building archybase.com — an all-in-one AI platform for interior design, exterior visualization, landscape generation, and sketch-to-render workflows.

The Problem Space

Architectural visualization has always been expensive. A single high-quality 3D render from a professional studio costs anywhere from $300 to $1,500 and takes 48–72 hours. For interior designers iterating on client preferences, that feedback loop is brutal. For homeowners trying to visualize a renovation before committing a six-figure budget, it's simply inaccessible.

AI image generation changed the equation — but raw diffusion models like Stable Diffusion, Midjourney, or DALL·E are not purpose-built for architectural use cases. They hallucinate furniture, distort spatial proportions, and misinterpret structural elements. You get beautiful chaos, not professional renders.

The technical challenge was: how do you constrain generative AI to produce architecturally accurate, style-consistent, spatially coherent outputs, at scale, with sub-10-second latency?

Core Architecture

The Rendering Pipeline

The generation pipeline runs on a custom fine-tuned diffusion model with ControlNet conditioning. ControlNet is the key ingredient here — it allows the model to receive a structural "control signal" (depth maps, edge maps, pose maps) alongside the text prompt, so spatial layout is preserved even when the style is completely transformed.

For the Sketch to Render workflow, the pipeline looks like this:

User uploads a sketch or CAD drawing
Edge detection extracts the structural skeleton (Canny + HED)
ControlNet feeds the edge map as a hard constraint
The diffusion model generates a photorealistic render that respects the original structure
Post-processing upscales to 4K via a Real-ESRGAN step

For room redesign (the core AI Interior Design flow), we use a depth-conditioned ControlNet variant that preserves spatial relationships while completely replacing surface materials, furniture, and lighting.

Infrastructure Choices

The rendering workload runs on GPU clusters (NVIDIA A100s for high-res generation, T4s for standard tier). The stack is:

Next.js 15 (App Router) for the frontend and API routes
Railway for containerized deployment of the rendering service
Cloudflare R2 for storing input images and output renders at scale
Prisma + PostgreSQL for user state, generation history, and subscription management
Stripe for subscription billing (Starter / Plus / Pro tiers)

One non-obvious lesson: GPU cold starts are brutal for UX. When a GPU instance scales down to zero and then has to spin back up, you're looking at 30–60 second startup times. We solved this by keeping a minimum number of warm instances running during peak hours and implementing optimistic UI patterns to mask perceived latency.

The ControlNet Selection Problem

Different design tasks require different control modalities:

Task	ControlNet Type	Control Signal
Sketch to Render	Canny / HED	Edge maps
Room Redesign	Depth	Depth maps from MiDaS
Exterior Facade	Segmentation	Semantic masks
Landscape Design	Tile + Depth	Layout tiles + depth

Choosing the wrong ControlNet for a task is the single biggest source of output quality degradation. We built a task classifier that automatically selects the appropriate conditioning pipeline based on the input type and user intent.

The Landscape Design Challenge

Of all the product surfaces, the AI Landscape Design tool presented the most unique engineering challenges.

Landscape outputs are inherently seasonal and temporal — the same garden looks completely different in summer versus autumn versus winter. We needed the model to understand not just spatial layout but lighting direction, foliage density, and seasonal color palettes.

We solved this with a two-stage approach:

A scene graph generator that parses the input photo and produces a structured layout (hardscape vs. softscape vs. water features vs. structures)
A conditioning stack that combines depth maps with the scene graph labels to give the model explicit semantic information about what each region of the image represents

The result is that when a user says "transform this backyard into a Japanese zen garden in autumn," the model knows which parts of the image are plantable, which are hardscape, and which are architectural — so it doesn't try to grow moss on a swimming pool.

SEO as a Growth Channel

One thing I want to be transparent about for other indie developers: organic search is the backbone of early-stage SaaS growth if you're not paying for ads.

We built a programmatic SEO content matrix around the core value propositions — room types × design styles × use cases. Each combination (e.g., "Scandinavian modern living room AI render") gets a dedicated landing page with schema markup, proper breadcrumbs, and semantically rich content.

The architecture:

Dynamic routes in Next.js (/ai-[room]-design, /[style]-[space]-render)
JSON-LD structured data (HowTo, Product, BreadcrumbList schemas)
GSC + Plausible for measuring organic traffic and conversion rates

It took about 4 months to see meaningful organic traction, but now roughly 60% of new signups come through organic search.

Pricing Architecture

We run three paid tiers (Starter, Plus, Professional) with generation credits as the primary consumption unit. A key architectural decision was separating standard and "pro" generation credits — standard credits use the T4-based pipeline with faster but slightly lower quality outputs; pro credits use A100s with enhanced upscaling, longer inference steps, and higher coherence.

This lets price-sensitive users still get real value at the entry tier while giving power users (professional architects, real estate agents) a clear reason to upgrade.

What's Next

The roadmap includes:

AI Floor Plan Generator (currently in beta) — generating 2D floor plans from text descriptions and converting them to 3D walkthroughs
Video rendering — we already have a Remotion-based video rendering service running with GPU-accelerated EGL for generating animated flythrough renders
Collaboration features — shared project workspaces for architect-client collaboration

Final Thoughts

Building a production-grade AI image generation product is not just a machine learning problem — it's an infrastructure problem, a product problem, and a UX problem simultaneously. The model quality sets your ceiling, but cold start latency, credit economics, and the clarity of your user workflow determine whether people actually stick around.

If you're curious about the product or want to explore what AI-native architectural visualization looks like in practice, check out archybase.com.