DEV Community

Cover image for I thought creative AI needed better prompts, but it actually needed 4-step LLM routing
Lars Winstand
Lars Winstand

Posted on • Originally published at standardcompute.com

I thought creative AI needed better prompts, but it actually needed 4-step LLM routing

I keep seeing developers try to build a “creative AI agent” by writing one giant prompt and hoping GPT-5 or Claude Opus can do everything.

That usually works for 10 minutes.

Then the real workflow shows up:

  • research trends
  • turn those into a usable brief
  • generate mockups
  • organize outputs for review
  • wait for a human decision

At that point, the problem is no longer prompting.
It’s routing.

That clicked for me while reading a small r/openclaw thread from a jewelry designer. The post itself wasn’t huge, but the question was dead-on: they didn’t need more ideas from ChatGPT. They needed an agent that could run more of the workflow.

That’s the important distinction.

Most people say they want AI for creativity.
What they actually want is a repeatable pipeline that turns vague inputs into reviewable deliverables.

The real gap is not ideation

ChatGPT-style brainstorming feels productive because it gives you instant output.

Ask for:

  • 10 product concepts
  • a seasonal moodboard direction
  • prompt ideas for image generation
  • a trend summary from TikTok or Pinterest

You’ll get something useful.

But then you still have to do the annoying part:

  • check constraints
  • create multiple directions
  • name files
  • sort references
  • save assets somewhere sane
  • hand the work to a human

That is not “creative chatting.”
That is orchestration.

Chat-based brainstorming Agent pipeline
Output is mostly ideas Output is structured deliverables
State lives in one long conversation State lives in tasks, folders, and records
Human role is ad hoc prompting Human role is explicit approval

If a workflow repeats, the answer is usually not “write a better mega-prompt.”

It’s:

  1. break the work into stages
  2. assign the right model to each stage
  3. make handoffs explicit
  4. keep a human in the loop

Why one model keeps disappointing you

Because you’re asking one model to be all of these at once:

  • trend researcher
  • creative director
  • manufacturing sanity checker
  • image prompt writer
  • file organizer

That’s not a prompt problem.
That’s bad staffing.

The useful setup here is model-specific routing:

  • Grok for trend search and intake
  • Claude Opus for creative reasoning and brief writing
  • GPT-5-class image models for mockups
  • n8n or Make for storage, naming, and handoff

A general-purpose model can fake this.
It just tends to do it unevenly and expensively.

Single-model workflow Routed workflow
One model handles every task Each task gets a model that fits
Failures are vague Failures are isolated by stage
Easy to prototype Easier to operate repeatedly
Expensive if every step uses the best model Cheaper when cheap steps stay cheap

My favorite split for this kind of workflow

If I were building this today, I’d split responsibilities like this:

1) Grok for trend intake

Use Grok when the task is web-heavy and signal-oriented.

Examples:

  • scrape current aesthetic trends
  • summarize competitor launches
  • collect references from Pinterest/TikTok/blogs
  • cluster repeated motifs

2) Claude Opus for reasoning and brief writing

Use Claude Opus when the task needs taste, synthesis, and contradiction detection.

Examples:

  • turn trend data into a coherent brief
  • identify conflicts like “minimalist but highly ornate”
  • map concepts to customer segment or price point
  • produce a human-reviewable summary

3) GPT-5-class image model for visual exploration

Use image generation only after the brief is approved.

Examples:

  • generate prompt variants
  • produce mockups for 3-5 directions
  • create image batches for review

4) n8n or Make for the boring grown-up work

This is where a lot of agent demos fall apart.

You still need:

  • file naming
  • folder creation
  • Airtable or Notion updates
  • Google Drive uploads
  • Slack notifications
  • review gates

That is n8n/Make territory, not “just ask the LLM nicely” territory.

What the pipeline actually looks like

Here’s the version I’d actually ship.

main agent
  -> trend search agent (Grok)
  -> brief writer agent (Claude Opus)
  -> constraint checker
  -> image prompt generator
  -> mockup generator (GPT-5-class image model)
  -> output aggregator
  -> n8n/Make workflow for storage and handoff
  -> human approval
  -> optional second pass
Enter fullscreen mode Exit fullscreen mode

And here’s a more concrete JSON-style representation:

{
  "workflow": [
    {
      "step": "trend_search",
      "model": "grok",
      "output": "trend_summary.json"
    },
    {
      "step": "brief_generation",
      "model": "claude-opus",
      "input": "trend_summary.json",
      "output": "creative_brief.md"
    },
    {
      "step": "constraint_check",
      "model": "claude-opus",
      "input": "creative_brief.md",
      "output": "constraints.md"
    },
    {
      "step": "mockup_generation",
      "model": "gpt-5-image",
      "input": ["creative_brief.md", "constraints.md"],
      "output": "mockups/"
    },
    {
      "step": "handoff",
      "tool": "n8n",
      "output": "google_drive + airtable + slack_review"
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

OpenClaw for agent loops, n8n for production plumbing

I like OpenClaw for agent delegation and multi-step reasoning.

I like n8n and Make for deterministic business-process work.

That split matters.

OpenClaw-style agent setup n8n or Make automation
Best for iterative agent behavior Best for explicit workflows
Good at task delegation Good at connectors and state transitions
Great for experimentation Better for production handoff

If you try to force OpenClaw to do everything, you end up rebuilding workflow automation badly.

If you try to force n8n to do all the reasoning, you end up with a brittle maze of prompts.

Use each tool for what it’s good at.

The human has to be in the diagram

This part gets skipped in a lot of “autonomous agent” posts.

Creative workflows need approval points.

A human still has to answer:

  • Is this trend relevant to our customer?
  • Does this fit the brand?
  • Is this manufacturable?
  • Which direction deserves another round?

If you remove that step, you don’t get autonomy.
You get polished nonsense at scale.

The right output is not “final design.”
The right output is a clean review package.

Something like:

  1. trend summary
  2. design brief
  3. constraint check
  4. prompt set
  5. mockup batch
  6. organized assets
  7. human decision

That last step is not failure.
That’s the product.

The cost problem is real

This kind of workflow is iterative by default.

That means cost can explode if every stage uses the most expensive model.

And this is exactly where teams building agents start feeling token anxiety:

  • every retry costs money
  • every branch costs money
  • every background run costs money
  • every automation becomes something you have to monitor financially

Cheap steps should stay cheap.
Expensive models should be reserved for the places where quality actually matters.

A sane routing pattern looks like this:

  • cheap/local model for classification, labeling, cleanup
  • mid-tier model for standard agent tasks
  • premium model for synthesis, judgment, or final review

That principle matters more than the exact vendor lineup.

Example: a practical implementation sketch

Here’s a very stripped-down Python example showing stage routing through an OpenAI-compatible client.

If you’re using Standard Compute, the point is that you can keep the OpenAI-compatible API shape while routing workloads across different models without redesigning your entire app around per-token cost paranoia.

from openai import OpenAI

client = OpenAI(
    base_url="https://api.standardcompute.com/v1",
    api_key="YOUR_STANDARD_COMPUTE_API_KEY"
)

def run_trend_search(topic):
    return client.chat.completions.create(
        model="grok-4.20",
        messages=[
            {"role": "system", "content": "Find current trend signals and summarize them."},
            {"role": "user", "content": topic}
        ]
    )

def write_brief(trend_summary):
    return client.chat.completions.create(
        model="claude-opus-4.6",
        messages=[
            {"role": "system", "content": "Turn trend research into a concise creative brief with constraints."},
            {"role": "user", "content": trend_summary}
        ]
    )

def generate_mockup_prompts(brief):
    return client.chat.completions.create(
        model="gpt-5.4",
        messages=[
            {"role": "system", "content": "Generate image prompts for 4 distinct visual directions."},
            {"role": "user", "content": brief}
        ]
    )
Enter fullscreen mode Exit fullscreen mode

And if you want to test the API with curl:

curl https://api.standardcompute.com/v1/chat/completions \
  -H "Authorization: Bearer $STANDARD_COMPUTE_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "model": "claude-opus-4.6",
    "messages": [
      {"role": "system", "content": "Write a creative brief from trend research."},
      {"role": "user", "content": "Summer jewelry trends: coastal textures, shell forms, brushed silver, soft asymmetry."}
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

That matters for developers because the best routing strategy is often operationally annoying under normal per-token pricing.

If your workflow runs every day across n8n, Make, Zapier, OpenClaw, or custom agents, cost predictability becomes part of system design, not just finance.

That’s the part a lot of AI blog posts skip.

What to automate first

Not image generation.

That’s the flashy trap.

Start with trend intake and brief generation.

Why?
Because consistency starts upstream.

If your inputs are messy, your mockups will just be messy faster and more expensively.

This is the order I’d use:

  1. scheduled trend search via Grok or OpenClaw search
  2. brief generation via Claude Opus
  3. constraint check against real-world limitations
  4. prompt set generation for multiple directions
  5. mockup generation with a GPT-5-class image model
  6. asset organization in Google Drive, Airtable, or Notion via n8n/Make
  7. human review gate before second-round exploration

That is much less magical than “AI designs my product line.”

It is also the version that survives contact with production.

Why this matters for developers building agents

The lesson here is bigger than jewelry or design workflows.

If you’re building AI agents for any repeatable business process, the pattern is the same:

  • one model is rarely the best worker for every job
  • routing beats mega-prompts
  • explicit handoffs beat giant chat histories
  • human approval beats fake autonomy
  • predictable cost matters if the workflow runs constantly

That last point is why products like Standard Compute are interesting for agent builders.

If you’re wiring together OpenClaw, n8n, Make, Zapier, or your own background workers, the hard part is not just getting good outputs.

It’s getting good outputs repeatedly without turning every automation into a billing event you have to babysit.

Unlimited AI compute with an OpenAI-compatible API is not just a pricing trick.
It changes what kinds of multi-step agent workflows are practical to run all day.

Final take

The useful creative assistant is not the one that gives you more ideas.

It’s the one that shows up tomorrow with:

  • research already collected
  • a brief already written
  • mockups already grouped
  • assets already organized
  • a clear place for a human to say yes or no

That’s not better prompting.
That’s better routing.

And honestly, once you see the difference, it’s hard to go back to one giant chat window pretending to be a workflow.

Top comments (0)