Lars Winstand

Posted on Jun 10 • Originally published at standardcompute.com

I thought creative AI needed better prompts, but it actually needed 4-step LLM routing

#ai #automation #llm #devops

I keep seeing developers try to build a “creative AI agent” by writing one giant prompt and hoping GPT-5 or Claude Opus can do everything.

That usually works for 10 minutes.

Then the real workflow shows up:

research trends
turn those into a usable brief
generate mockups
organize outputs for review
wait for a human decision

At that point, the problem is no longer prompting.
It’s routing.

That clicked for me while reading a small r/openclaw thread from a jewelry designer. The post itself wasn’t huge, but the question was dead-on: they didn’t need more ideas from ChatGPT. They needed an agent that could run more of the workflow.

That’s the important distinction.

Most people say they want AI for creativity.
What they actually want is a repeatable pipeline that turns vague inputs into reviewable deliverables.

The real gap is not ideation

ChatGPT-style brainstorming feels productive because it gives you instant output.

Ask for:

10 product concepts
a seasonal moodboard direction
prompt ideas for image generation
a trend summary from TikTok or Pinterest

You’ll get something useful.

But then you still have to do the annoying part:

check constraints
create multiple directions
name files
sort references
save assets somewhere sane
hand the work to a human

That is not “creative chatting.”
That is orchestration.

Chat-based brainstorming	Agent pipeline
Output is mostly ideas	Output is structured deliverables
State lives in one long conversation	State lives in tasks, folders, and records
Human role is ad hoc prompting	Human role is explicit approval

If a workflow repeats, the answer is usually not “write a better mega-prompt.”

It’s:

break the work into stages
assign the right model to each stage
make handoffs explicit
keep a human in the loop

Why one model keeps disappointing you

Because you’re asking one model to be all of these at once:

trend researcher
creative director
manufacturing sanity checker
image prompt writer
file organizer

That’s not a prompt problem.
That’s bad staffing.

The useful setup here is model-specific routing:

Grok for trend search and intake
Claude Opus for creative reasoning and brief writing
GPT-5-class image models for mockups
n8n or Make for storage, naming, and handoff

A general-purpose model can fake this.
It just tends to do it unevenly and expensively.

Single-model workflow	Routed workflow
One model handles every task	Each task gets a model that fits
Failures are vague	Failures are isolated by stage
Easy to prototype	Easier to operate repeatedly
Expensive if every step uses the best model	Cheaper when cheap steps stay cheap

My favorite split for this kind of workflow

If I were building this today, I’d split responsibilities like this:

1) Grok for trend intake

Use Grok when the task is web-heavy and signal-oriented.

Examples:

scrape current aesthetic trends
summarize competitor launches
collect references from Pinterest/TikTok/blogs
cluster repeated motifs

2) Claude Opus for reasoning and brief writing

Use Claude Opus when the task needs taste, synthesis, and contradiction detection.

Examples:

turn trend data into a coherent brief
identify conflicts like “minimalist but highly ornate”
map concepts to customer segment or price point
produce a human-reviewable summary

3) GPT-5-class image model for visual exploration

Use image generation only after the brief is approved.

Examples:

generate prompt variants
produce mockups for 3-5 directions
create image batches for review

4) n8n or Make for the boring grown-up work

This is where a lot of agent demos fall apart.

You still need:

file naming
folder creation
Airtable or Notion updates
Google Drive uploads
Slack notifications
review gates

That is n8n/Make territory, not “just ask the LLM nicely” territory.

What the pipeline actually looks like

Here’s the version I’d actually ship.

main agent
  -> trend search agent (Grok)
  -> brief writer agent (Claude Opus)
  -> constraint checker
  -> image prompt generator
  -> mockup generator (GPT-5-class image model)
  -> output aggregator
  -> n8n/Make workflow for storage and handoff
  -> human approval
  -> optional second pass

And here’s a more concrete JSON-style representation:

{
  "workflow": [
    {
      "step": "trend_search",
      "model": "grok",
      "output": "trend_summary.json"
    },
    {
      "step": "brief_generation",
      "model": "claude-opus",
      "input": "trend_summary.json",
      "output": "creative_brief.md"
    },
    {
      "step": "constraint_check",
      "model": "claude-opus",
      "input": "creative_brief.md",
      "output": "constraints.md"
    },
    {
      "step": "mockup_generation",
      "model": "gpt-5-image",
      "input": ["creative_brief.md", "constraints.md"],
      "output": "mockups/"
    },
    {
      "step": "handoff",
      "tool": "n8n",
      "output": "google_drive + airtable + slack_review"
    }
  ]
}

OpenClaw for agent loops, n8n for production plumbing

I like OpenClaw for agent delegation and multi-step reasoning.

I like n8n and Make for deterministic business-process work.

That split matters.

OpenClaw-style agent setup	n8n or Make automation
Best for iterative agent behavior	Best for explicit workflows
Good at task delegation	Good at connectors and state transitions
Great for experimentation	Better for production handoff

If you try to force OpenClaw to do everything, you end up rebuilding workflow automation badly.

If you try to force n8n to do all the reasoning, you end up with a brittle maze of prompts.

Use each tool for what it’s good at.

The human has to be in the diagram

This part gets skipped in a lot of “autonomous agent” posts.

Creative workflows need approval points.

A human still has to answer:

Is this trend relevant to our customer?
Does this fit the brand?
Is this manufacturable?
Which direction deserves another round?

If you remove that step, you don’t get autonomy.
You get polished nonsense at scale.

The right output is not “final design.”
The right output is a clean review package.

Something like:

trend summary
design brief
constraint check
prompt set
mockup batch
organized assets
human decision

That last step is not failure.
That’s the product.

The cost problem is real

This kind of workflow is iterative by default.

That means cost can explode if every stage uses the most expensive model.

And this is exactly where teams building agents start feeling token anxiety:

every retry costs money
every branch costs money
every background run costs money
every automation becomes something you have to monitor financially

Cheap steps should stay cheap.
Expensive models should be reserved for the places where quality actually matters.

A sane routing pattern looks like this:

cheap/local model for classification, labeling, cleanup
mid-tier model for standard agent tasks
premium model for synthesis, judgment, or final review

That principle matters more than the exact vendor lineup.

Example: a practical implementation sketch

Here’s a very stripped-down Python example showing stage routing through an OpenAI-compatible client.

If you’re using Standard Compute, the point is that you can keep the OpenAI-compatible API shape while routing workloads across different models without redesigning your entire app around per-token cost paranoia.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.standardcompute.com/v1",
    api_key="YOUR_STANDARD_COMPUTE_API_KEY"
)

def run_trend_search(topic):
    return client.chat.completions.create(
        model="grok-4.20",
        messages=[
            {"role": "system", "content": "Find current trend signals and summarize them."},
            {"role": "user", "content": topic}
        ]
    )

def write_brief(trend_summary):
    return client.chat.completions.create(
        model="claude-opus-4.6",
        messages=[
            {"role": "system", "content": "Turn trend research into a concise creative brief with constraints."},
            {"role": "user", "content": trend_summary}
        ]
    )

def generate_mockup_prompts(brief):
    return client.chat.completions.create(
        model="gpt-5.4",
        messages=[
            {"role": "system", "content": "Generate image prompts for 4 distinct visual directions."},
            {"role": "user", "content": brief}
        ]
    )

And if you want to test the API with curl:

curl https://api.standardcompute.com/v1/chat/completions \
  -H "Authorization: Bearer $STANDARD_COMPUTE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4.6",
    "messages": [
      {"role": "system", "content": "Write a creative brief from trend research."},
      {"role": "user", "content": "Summer jewelry trends: coastal textures, shell forms, brushed silver, soft asymmetry."}
    ]
  }'

That matters for developers because the best routing strategy is often operationally annoying under normal per-token pricing.

If your workflow runs every day across n8n, Make, Zapier, OpenClaw, or custom agents, cost predictability becomes part of system design, not just finance.

That’s the part a lot of AI blog posts skip.

What to automate first

Not image generation.

That’s the flashy trap.

Start with trend intake and brief generation.

Why?
Because consistency starts upstream.

If your inputs are messy, your mockups will just be messy faster and more expensively.

This is the order I’d use:

scheduled trend search via Grok or OpenClaw search
brief generation via Claude Opus
constraint check against real-world limitations
prompt set generation for multiple directions
mockup generation with a GPT-5-class image model
asset organization in Google Drive, Airtable, or Notion via n8n/Make
human review gate before second-round exploration

That is much less magical than “AI designs my product line.”

It is also the version that survives contact with production.

Why this matters for developers building agents

The lesson here is bigger than jewelry or design workflows.

If you’re building AI agents for any repeatable business process, the pattern is the same:

one model is rarely the best worker for every job
routing beats mega-prompts
explicit handoffs beat giant chat histories
human approval beats fake autonomy
predictable cost matters if the workflow runs constantly

That last point is why products like Standard Compute are interesting for agent builders.

If you’re wiring together OpenClaw, n8n, Make, Zapier, or your own background workers, the hard part is not just getting good outputs.

It’s getting good outputs repeatedly without turning every automation into a billing event you have to babysit.

Unlimited AI compute with an OpenAI-compatible API is not just a pricing trick.
It changes what kinds of multi-step agent workflows are practical to run all day.

Final take

The useful creative assistant is not the one that gives you more ideas.

It’s the one that shows up tomorrow with:

research already collected
a brief already written
mockups already grouped
assets already organized
a clear place for a human to say yes or no

That’s not better prompting.
That’s better routing.

And honestly, once you see the difference, it’s hard to go back to one giant chat window pretending to be a workflow.

DEV Community

I thought creative AI needed better prompts, but it actually needed 4-step LLM routing

The real gap is not ideation

Why one model keeps disappointing you

My favorite split for this kind of workflow

1) Grok for trend intake

2) Claude Opus for reasoning and brief writing

3) GPT-5-class image model for visual exploration

4) n8n or Make for the boring grown-up work

What the pipeline actually looks like

OpenClaw for agent loops, n8n for production plumbing

The human has to be in the diagram

The cost problem is real

Example: a practical implementation sketch

What to automate first

Why this matters for developers building agents

Final take

Top comments (0)