Om Prakash

Posted on Apr 12

How We Built a $0.01 Image-to-3D API with PBR Textures Using Hunyuan3D 2.1

#ai #3d #api #hunyuan3d

Turning a Single Image into a Production-Ready 3D Model for $0.01

Last week we shipped something our users have been asking for: image-to-3D generation with PBR textures on PixelAPI. Here's the engineering breakdown of how we got there, what we learned, and why we priced it at just $0.01 per model.

The Problem

Most image-to-3D APIs fall into two camps:

Enterprise-only: Luma AI, CSM.ai — $0.10 to $0.50 per model, API access requires sales calls
Subscription-locked: Meshy ($20/mo), Tripo ($12-140/mo) — you pay monthly whether you generate 1 model or 1000

Developers building e-commerce tools, game asset pipelines, or AR/VR apps need pay-per-use pricing with no commitments. And they need good quality — untextured meshes aren't useful for production.

Choosing the Right Model

We evaluated three open-source models:

Model	Shape Quality (ULIP-T)	Textures	VRAM	License	Auth Required
TRELLIS (Microsoft)	0.0769	Basic	~20GB	MIT	Yes (gated HF)
TripoSR	0.0767	Basic	~8GB	MIT	No
Hunyuan3D 2.1	0.0774	PBR	~29GB	Apache 2.0	No

Hunyuan3D 2.1 won on every metric that matters for production use: best shape quality, full PBR texture support (albedo + normal + roughness maps), and no API keys needed for model weights.

The tradeoff: it needs ~29GB VRAM, which means an RTX 6000 Ada (48GB). Our RTX 4070s (16GB) can't run it. We dedicated our LLM3 machine (RTX 6000 Ada) as the 3D worker.

The Architecture

User Upload
    → POST /v1/3d/generate (Gateway)
    → Image saved to storage
    → Job pushed to Redis queue (pixelapi:3d:jobs)
    → Worker picks up job
    → Shape generation (~45s)
    → PBR texture painting (~45s)
    → GLB uploaded to CDN
    → Result returned via API

Key decisions:

Standalone worker, not integrated: Hunyuan3D uses ~29GB VRAM continuously when loaded. Mixing it with our image generation workers (which use 12-16GB) would cause constant OOM kills. The 3D worker runs as a separate systemd service.

Polling over WebSockets: The generation takes ~90 seconds total. We use synchronous polling from the client (the endpoint blocks until complete) rather than WebSockets. Simpler architecture, works with all clients.

Redis queue: Same pattern as our image generation — jobs in Redis, worker pops and processes. Allows easy horizontal scaling if we add more GPU machines.

The Hard Parts

Building this was not smooth. Here's every bug we hit:

1. The target_reduction bug

Hunyuan3D's mesh simplification uses trimesh.simplify_quadric_decimation(). The code passed target_count=40000 as a positional argument, which Python mapped to the percent parameter (first param). So trimesh tried to simplify with percent=40000 — which is > 1.0. The fix: face_count=target_count.

2. Missing C++ extensions

Two compiled modules needed building:

mesh_inpaint_processor.cpp (pybind11) — handles vertex inpainting for texture painting
custom_rasterizer (CUDA) — differentiable renderer for multi-view generation

Neither shipped pre-compiled. The compile script had hardcoded python (not python3), and custom_rasterizer_kernel needed LD_LIBRARY_PATH pointing to PyTorch's lib directory.

3. The Redis connection issue

Our gateway uses aioredis (async Redis). The 3D endpoint imported rdb from the queue module at load time, but rdb is None until init_redis() runs during app startup. Solution: lazy get_3d_rdb() function that creates its own connection on first use.

4. The bpy (Blender) trap

Hunyuan3D imports bpy (Blender's Python module) in its mesh utilities. Ubuntu's blender package doesn't expose bpy as a Python module — you'd need to build Blender from source or use the standalone bpy pip package (which doesn't exist for Python 3.10). We made bpy import optional with a mock module, then fixed the actual code paths to not need it.

Pricing Math

Our rule: 2x cheaper than the cheapest mainstream competitor.

Tripo3D Pro: ~$0.0066/model ($19.90/mo for 3000 credits)
Meshy Pro: ~$0.02/model ($20/mo for 1000 credits)
PixelAPI: $0.01/model (10 credits)

We went slightly above the 2x rule for Tripo3D (their subscription pricing is loss-leader), but comfortably 2x cheaper than Meshy and 10-50x cheaper than Luma/enterprise options.

Cost per model for us:

GPU time: ~90s on RTX 6000 Ada → ~$0.001 electricity
Storage: ~4-22MB GLB per model → negligible
Bandwidth: ~5-25MB download → ~$0.0002

We're profitable on day one, even at $0.01.

What's Next

GPU priority scheduler: Currently 3D shares LLM3 with video generation and our Mushika rendering service. We need intelligent queue management that preempts lower-priority work when revenue jobs arrive.
Multi-model support: TripoSR for fast/cheap models (~10s), Hunyuan3D for quality.
3D model marketplace: Let users sell generated 3D assets.

Try It

curl -X POST https://api.pixelapi.dev/v1/3d/generate \
  -H "Authorization: Bearer YOUR_KEY" \
  -F "image=@product.jpg" \
  -F "format=glb"

# Returns: {"status":"completed","output_url":"...glb","generation_time":88.5}

If you're building anything with 3D APIs, I'd love to hear about it. Find me on X/Twitter or Discord.

DEV Community