In the previous article you built an agent that generates on-brand copy with memory and context. But a social media post without an image is an invisible post.
The problem: a designer costs $500-$1500/month. Cloud AI tools (DALL-E, Midjourney) cost $20-$60/month each and don't understand your brand. Every image you generate looks different.
The solution: Stable Diffusion running on your infrastructure, with LoRAs that learn your visual identity. Costs $0 additional per image and produces brand-consistent results.
Why Stable Diffusion Instead of DALL-E or Midjourney?
| Comparison | DALL-E / Midjourney | Stable Diffusion Self-Hosted |
|---|---|---|
| Cost | $20-60/month | $0/image (GPU you already have) |
| Automation API | Yes | Yes (ComfyUI) |
| Visual consistency | Low (loose prompts) | High (brand LoRAs) |
| Data privacy | Your images go to OpenAI | Everything on your server |
| Technical control | Limited | Full (prompts, models, dimensions) |
Stable Diffusion turns text into images. ComfyUI turns it into an automatable API. LoRAs turn generic results into brand content.
Step 1: Enable ComfyUI in API Mode
# Install with GPU support (NVIDIA)
git clone https://github.com/comfyanonymous/ComfyUI.git
cd ComfyUI
pip install -r requirements.txt
# Download a base model (SDXL recommended for quality)
# Place the .safetensors file in ComfyUI/models/checkpoints/
# Start in API mode (no GUI)
python main.py --listen 0.0.0.0 --port 8188 --api
ComfyUI exposes REST endpoints:
-
POST /api/prompt— execute a workflow -
GET /api/history/{prompt_id}— get results -
GET /api/view?filename={name}— download the image
Step 2: Prompt Structure That Actually Works
A good prompt for social media isn't "a nice tech image." It's a precise technical instruction:
(quality_tags) subject_description, style_directive,
lighting_setup, color_palette, camera_framing,
negative_prompt
Real example for a Guayoyo Tech post:
masterpiece, best quality, 8k, professional photo of a
modern developer workspace with multiple monitors showing
code and dashboards, clean minimalist desk, warm ambient lighting
from desk lamp, blue and teal accent colors (#1A73E8 #22D3EE),
shallow depth of field, 1080x1080 square composition
Negative: lowres, bad anatomy, text, watermark, blurry,
oversaturated, people, hands, messy desk, dark shadows
Universal quality tags:
masterpiece, best quality, 8k, highres, sharp focus,
intricate details, professional lighting
Styles by content type:
-
Technical/DevOps:
isometric view, clean technical diagram, blueprint aesthetic, dark mode UI -
Business:
corporate photography, glass office, professional atmosphere, natural window light -
Abstract/Conceptual:
digital art, abstract geometry, tech wave, gradient mesh, minimal
Guayoyo Tech color palette:
blue and teal accents (#1A73E8, #22D3EE) on dark background (#0b1120)
Step 3: LoRAs — The Consistency Secret
A LoRA (Low-Rank Adaptation) is a mini-model that plugs into Stable Diffusion to teach it a specific concept: your logo, your visual style, your color palette.
Option A: Use public LoRAs (free)
Thousands available on Civitai.com. For a tech/modern style:
# Download popular LoRAs
wget https://civitai.com/api/download/models/XXXXX -O ComfyUI/models/loras/tech-style.safetensors
In the prompt: <lora:tech-style:0.8> modern developer workspace...
Option B: Train your own LoRA (~$2 GPU cloud)
With 10-15 reference images of your brand (website screenshots, previous posts, color palette):
# Using Kohya SS (open-source tool)
git clone https://github.com/bmaltais/kohya_ss.git
cd kohya_ss
# Train with 10-15 reference images → ~30 min on T4 GPU
A LoRA trained on your assets produces images that look like your design team made them.
Step 4: ComfyUI Workflow as JSON API
ComfyUI uses JSON workflows. Export one from the UI and send it via API:
{
"3": {
"class_type": "KSampler",
"inputs": {
"seed": 42, "steps": 25, "cfg": 7.5,
"sampler_name": "euler_ancestral", "scheduler": "normal",
"denoise": 1.0, "model": ["4", 0],
"positive": ["6", 0], "negative": ["7", 0],
"latent_image": ["5", 0]
}
},
"4": { "class_type": "CheckpointLoaderSimple", "inputs": { "ckpt_name": "sd_xl_base_1.0.safetensors" }},
"5": { "class_type": "EmptyLatentImage", "inputs": { "width": 1080, "height": 1080, "batch_size": 1 }},
"6": { "class_type": "CLIPTextEncode", "inputs": { "text": "YOUR PROMPT HERE", "clip": ["4", 1] }},
"7": { "class_type": "CLIPTextEncode", "inputs": { "text": "YOUR NEGATIVE HERE", "clip": ["4", 1] }},
"8": { "class_type": "VAEDecode", "inputs": { "samples": ["3", 0], "vae": ["4", 2] }},
"9": { "class_type": "SaveImage", "inputs": { "filename_prefix": "guayoyo_post", "images": ["8", 0] }}
}
n8n HTTP Request node to trigger generation:
POST http://localhost:8188/api/prompt
Headers: Content-Type: application/json
Body: { "prompt": {{ $json.comfyui_workflow }} }
// Response: { "prompt_id": "abc-123" }
// Poll and get result:
GET http://localhost:8188/api/history/abc-123
// Download image:
GET http://localhost:8188/api/view?filename={{ $json.filename }}
Step 5: Dimensions by Platform
| Platform | Format | Dimensions |
|---|---|---|
| Instagram Feed (square) | 1:1 | 1080×1080 |
| Instagram Feed (portrait) | 4:5 | 1080×1350 |
| Instagram Story / Reel | 9:16 | 1080×1920 |
| TikTok | 9:16 | 1080×1920 |
| Facebook / LinkedIn | 1.91:1 | 1200×630 |
In the ComfyUI JSON, change width and height in the EmptyLatentImage node per platform.
Step 6: Automatic Quality Validation
Not every generation is good. A validation node using Python Pillow:
from PIL import Image
import io, requests
img = Image.open(io.BytesIO(requests.get(image_url).content))
width, height = img.size
if width < 1000 or height < 1000:
raise ValueError("Image too small")
# Detect fully black or white images
pixels = list(img.getdata())
avg_brightness = sum(sum(p[:3])/3 for p in pixels) / len(pixels)
if avg_brightness < 10 or avg_brightness > 245:
raise ValueError("Image solid black or white")
# Detect low contrast (likely blurry)
# If <20% of pixels deviate >30 from average → probably blurry
If validation fails, the flow auto-retries with a different seed.
The Image Node in Your Pipeline
Your n8n flow now has 3 new nodes after the agent:
Agent (copy) → ComfyUI Prompt Builder → POST ComfyUI API
→ Wait 10s → GET Result → Validation →
├─ OK → WhatsApp Preview (Article 4)
└─ Failed → Retry (seed + 1000)
In ~30 seconds you get an on-brand, validated image ready for preview. No designer. No watermark. No subscription.
Want a self-hosted content pipeline that generates brand-consistent visuals? At Guayoyo Tech, we design and implement the complete solution — on your infrastructure, with your data, under your control.
Top comments (0)