jesus manrique

Posted on May 14 • Originally published at guayoyo.tech

Images That Sell: Automated Multimedia Generation Without a Designer — Part 3 of 5

#ai #automation #productivity #tutorial

In the previous article you built an agent that generates on-brand copy with memory and context. But a social media post without an image is an invisible post.

The problem: a designer costs $500-$1500/month. Cloud AI tools (DALL-E, Midjourney) cost $20-$60/month each and don't understand your brand. Every image you generate looks different.

The solution: Stable Diffusion running on your infrastructure, with LoRAs that learn your visual identity. Costs $0 additional per image and produces brand-consistent results.

Why Stable Diffusion Instead of DALL-E or Midjourney?

Comparison	DALL-E / Midjourney	Stable Diffusion Self-Hosted
Cost	$20-60/month	$0/image (GPU you already have)
Automation API	Yes	Yes (ComfyUI)
Visual consistency	Low (loose prompts)	High (brand LoRAs)
Data privacy	Your images go to OpenAI	Everything on your server
Technical control	Limited	Full (prompts, models, dimensions)

Stable Diffusion turns text into images. ComfyUI turns it into an automatable API. LoRAs turn generic results into brand content.

Step 1: Enable ComfyUI in API Mode

# Install with GPU support (NVIDIA)
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt

# Download a base model (SDXL recommended for quality)
# Place the .safetensors file in ComfyUI/models/checkpoints/

# Start in API mode (no GUI)
python main.py --listen 0.0.0.0 --port 8188 --api

ComfyUI exposes REST endpoints:

POST /api/prompt — execute a workflow
GET /api/history/{prompt_id} — get results
GET /api/view?filename={name} — download the image

Step 2: Prompt Structure That Actually Works

A good prompt for social media isn't "a nice tech image." It's a precise technical instruction:

(quality_tags) subject_description, style_directive, 
lighting_setup, color_palette, camera_framing, 
negative_prompt

Real example for a Guayoyo Tech post:

masterpiece, best quality, 8k, professional photo of a 
modern developer workspace with multiple monitors showing 
code and dashboards, clean minimalist desk, warm ambient lighting 
from desk lamp, blue and teal accent colors (#1A73E8 #22D3EE), 
shallow depth of field, 1080x1080 square composition

Negative: lowres, bad anatomy, text, watermark, blurry, 
oversaturated, people, hands, messy desk, dark shadows

Universal quality tags:

masterpiece, best quality, 8k, highres, sharp focus, 
intricate details, professional lighting

Styles by content type:

Technical/DevOps: isometric view, clean technical diagram, blueprint aesthetic, dark mode UI
Business: corporate photography, glass office, professional atmosphere, natural window light
Abstract/Conceptual: digital art, abstract geometry, tech wave, gradient mesh, minimal

Guayoyo Tech color palette:

blue and teal accents (#1A73E8, #22D3EE) on dark background (#0b1120)

Step 3: LoRAs — The Consistency Secret

A LoRA (Low-Rank Adaptation) is a mini-model that plugs into Stable Diffusion to teach it a specific concept: your logo, your visual style, your color palette.

Option A: Use public LoRAs (free)

Thousands available on Civitai.com. For a tech/modern style:

# Download popular LoRAs
wget https://civitai.com/api/download/models/XXXXX -O ComfyUI/models/loras/tech-style.safetensors

In the prompt: <lora:tech-style:0.8> modern developer workspace...

Option B: Train your own LoRA (~$2 GPU cloud)

With 10-15 reference images of your brand (website screenshots, previous posts, color palette):

# Using Kohya SS (open-source tool)
git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
# Train with 10-15 reference images → ~30 min on T4 GPU

A LoRA trained on your assets produces images that look like your design team made them.

Step 4: ComfyUI Workflow as JSON API

ComfyUI uses JSON workflows. Export one from the UI and send it via API:

{
  "3": {
    "class_type": "KSampler",
    "inputs": {
      "seed": 42, "steps": 25, "cfg": 7.5,
      "sampler_name": "euler_ancestral", "scheduler": "normal",
      "denoise": 1.0, "model": ["4", 0],
      "positive": ["6", 0], "negative": ["7", 0],
      "latent_image": ["5", 0]
    }
  },
  "4": { "class_type": "CheckpointLoaderSimple", "inputs": { "ckpt_name": "sd_xl_base_1.0.safetensors" }},
  "5": { "class_type": "EmptyLatentImage", "inputs": { "width": 1080, "height": 1080, "batch_size": 1 }},
  "6": { "class_type": "CLIPTextEncode", "inputs": { "text": "YOUR PROMPT HERE", "clip": ["4", 1] }},
  "7": { "class_type": "CLIPTextEncode", "inputs": { "text": "YOUR NEGATIVE HERE", "clip": ["4", 1] }},
  "8": { "class_type": "VAEDecode", "inputs": { "samples": ["3", 0], "vae": ["4", 2] }},
  "9": { "class_type": "SaveImage", "inputs": { "filename_prefix": "guayoyo_post", "images": ["8", 0] }}
}

n8n HTTP Request node to trigger generation:

POST http://localhost:8188/api/prompt
Headers: Content-Type: application/json
Body: { "prompt": {{ $json.comfyui_workflow }} }

// Response: { "prompt_id": "abc-123" }

// Poll and get result:
GET http://localhost:8188/api/history/abc-123

// Download image:
GET http://localhost:8188/api/view?filename={{ $json.filename }}

Step 5: Dimensions by Platform

Platform	Format	Dimensions
Instagram Feed (square)	1:1	1080×1080
Instagram Feed (portrait)	4:5	1080×1350
Instagram Story / Reel	9:16	1080×1920
TikTok	9:16	1080×1920
Facebook / LinkedIn	1.91:1	1200×630

In the ComfyUI JSON, change width and height in the EmptyLatentImage node per platform.

Step 6: Automatic Quality Validation

Not every generation is good. A validation node using Python Pillow:

from PIL import Image
import io, requests

img = Image.open(io.BytesIO(requests.get(image_url).content))

width, height = img.size
if width < 1000 or height < 1000:
    raise ValueError("Image too small")

# Detect fully black or white images
pixels = list(img.getdata())
avg_brightness = sum(sum(p[:3])/3 for p in pixels) / len(pixels)
if avg_brightness < 10 or avg_brightness > 245:
    raise ValueError("Image solid black or white")

# Detect low contrast (likely blurry)
# If <20% of pixels deviate >30 from average → probably blurry

If validation fails, the flow auto-retries with a different seed.

The Image Node in Your Pipeline

Your n8n flow now has 3 new nodes after the agent:

Agent (copy) → ComfyUI Prompt Builder → POST ComfyUI API 
→ Wait 10s → GET Result → Validation → 
  ├─ OK → WhatsApp Preview (Article 4)
  └─ Failed → Retry (seed + 1000)

In ~30 seconds you get an on-brand, validated image ready for preview. No designer. No watermark. No subscription.

Want a self-hosted content pipeline that generates brand-consistent visuals? At Guayoyo Tech, we design and implement the complete solution — on your infrastructure, with your data, under your control.

DEV Community