Lars Winstand

Posted on May 14 • Originally published at standardcompute.com

I read the r/openclaw garlic thread so you don’t have to, and yeah, the agent wasn’t the real problem

#openclaw #agents #ai #automation

The viral r/openclaw story about an agent ordering 2 kg of garlic after roughly 3 months of successful grocery runs is funny.

It’s also a very normal automation failure.

Not "AI went rogue."
Not "LLMs can’t reason."
Not even really "bad prompts."

The likely failure mode was much more boring:

the user wanted 2 heads of garlic
the grocery UI defaulted to kilograms
OpenClaw selected 2 kg
nobody caught it before checkout

That’s the part worth paying attention to.

Because if you build agent workflows for OpenClaw, n8n, Make, Zapier, or custom MCP setups, this is the exact class of bug that shows up once an agent starts touching real money.

The original thread blew up for obvious reasons. It had the perfect shape of an AI story: absurd result, plausible setup, and just enough chaos to make everyone feel smarter than the person who trusted the automation.

But the comments were better than the headline.

A lot of people in the OpenClaw community immediately recognized the issue: not rebellion, just a unit mismatch plus too much trust in a workflow that had been "working fine" for months.

That’s a real engineering lesson.

The scary part is that nothing exotic happened

If OpenClaw had decided garlic was a strategic asset and bought 900 pounds of it, that would be easier to dismiss.

You’d blame model instability and move on.

But this was a normal tool-use problem:

a product page exposed quantity in an ambiguous way
the agent interpreted the page literally
the human skipped review because prior runs had succeeded
the system executed exactly the wrong thing, cleanly

That’s worse, because it generalizes.

If your workflow relies on browser automation, MCP tools, or structured extraction from messy retail UIs, this is the category of failure you should expect.

Not dramatic failures.

Boring ones.

The kind that slip through because they look valid.

The best pattern from the thread: draft the cart, don’t auto-checkout

The smartest comments all converged on the same design:

let the agent gather context
let the agent build the draft action
require a human for the irreversible step

That’s not a compromise. That’s the correct architecture.

One commenter described a custom HEB workflow that pulls weekly recipes, extracts ingredients, adds them to a cart, and then stops for manual review before purchase.

That is exactly how I’d build it too.

Here’s the tradeoff in plain terms:

| Workflow style | What you gain | What can go wrong |
|----------|----------|
| Autonomous checkout | Maximum convenience, minimum human effort | Quantity errors, wrong substitutions, accidental purchases, high trust requirement |
| Reviewed cart | Most of the time savings with much better error containment | Slightly more friction, still needs a final human check |

If you’re building shopping automations, approval gates are not a sign of weakness. They’re the control surface.

When grocery automation is actually worth doing

One pushback in the thread was basically: why not just use subscriptions?

That’s fair for repeat purchases.

If your household buys the same coffee, paper towels, oat milk, and detergent every week, retailer-native subscriptions will beat an agent almost every time.

Less setup. Fewer moving parts. Better reliability.

But subscriptions break down once shopping starts depending on:

meal plans
pantry state
substitutions
budget constraints
inventory at a local store
dietary rules
one-off ingredients

That’s where an agent can help.

Not because buying groceries is inherently hard, but because turning messy planning into a draft cart is annoying and repetitive.

That’s a legitimate use case.

Approach	Flexibility	Setup complexity	Failure modes
OpenClaw + browser automation or MCP tools	High; recipes, substitutions, local inventory, pantry logic	Higher; prompts, tools, testing, permissions	Unit mismatches, UI changes, bad selectors, approval mistakes
Store-native subscriptions	Low to medium; good for repeat purchases	Low; built into retailer flow	Weak handling of changing plans and edge cases

So no, the garlic story does not prove grocery agents are useless.

It proves they need boundaries.

The part people ignore: recurring agent workflows get expensive fast

The garlic was the meme. The token bill is the real problem.

Across OpenClaw discussions, people keep running into the same thing:

big Claude Opus bills
session limits disappearing on heavy tasks
token burn from repeated tool use and page inspection

That makes sense.

Shopping is not a one-shot prompt.

A real workflow might:

read recipes
extract ingredients
compare product pages
check substitutions
call MCP tools
inspect browser state
retry when a selector breaks
summarize for approval

If you run that on a schedule, model spend becomes infrastructure, not experimentation.

Here’s the practical model tradeoff:

OpenClaw setup	Reasoning quality	Cost predictability	Typical tradeoff
Frontier models like Claude Opus or GPT-5	Best at ambiguous pages and messy reasoning	Worst if left unconstrained	Strong results, dangerous for always-on workflows
Local models like Qwen or Llama	More predictable operating cost	Better if you already own the hardware budget	Lower reasoning ceiling, weaker tool use in hard cases

My opinion: using Claude Opus for every grocery-cart decision is like driving a Formula 1 car to the mailbox.

Use frontier models for the genuinely hard judgment calls.
Use cheaper or local models for extraction, classification, repetitive browsing, and validation when you can.

And if you’re running agents continuously, per-token pricing gets annoying very quickly.

That’s one reason services like Standard Compute are interesting for this kind of workflow. If you’re building OpenAI-compatible automations and want predictable monthly cost instead of watching token spend every time an agent loops through product pages, flat-rate compute is a much better fit for always-on agents than traditional per-token billing.

That matters a lot more in production than it does in demos.

Why MCP makes this possible and dangerous at the same time

OpenClaw-style workflows are getting better because MCP gives LLM apps a standard way to talk to external tools.

That’s the real unlock.

Instead of hoping the model can infer everything from raw page text, you can expose capabilities directly:

browser actions
shopping APIs
custom scripts
inventory lookups
cart builders
internal resources

At a high level, an MCP server can expose tools, resources, and prompts over JSON-RPC.

That’s powerful.

It also means your design mistakes become executable.

A bad prompt is annoying.
A bad tool boundary is expensive.

If you expose purchase actions without approval gates, sanity checks, or permission limits, you are not building autonomy. You are building a very polite failure mode.

The hard question is never "can the agent click checkout?"

Of course it can.

The hard question is which actions should be:

available as tools
approval-gated
validated against quantity or price limits
blocked entirely

That’s true for groceries.
It’s even more true for finance, reimbursements, server admin, and internal ops.

Four controls I’d add before letting an agent buy anything

This is the practical part.

If you insist on letting OpenClaw or any browser agent build carts, put these controls in first.

1) Keep checkout behind an approval gate

Cart creation is agent-friendly.
Payment is not.

A simple pattern:

async function checkout(cart: Cart, approved: boolean) {
  if (!approved) {
    throw new Error("Checkout blocked: human approval required");
  }

  return submitOrder(cart);
}

If your workflow engine is n8n or Make, this can just be a manual approval step before the final HTTP request or browser action.

2) Normalize units before purchase

Do not trust raw UI labels.

If the recipe says "2 heads garlic" and the cart says "2 kg garlic", your workflow should stop.

Example validation layer:

type IngredientRequest = {
  item: string;
  quantity: number;
  unit: "head" | "kg" | "g" | "lb" | "item";
};

function validateGarlic(request: IngredientRequest, cart: IngredientRequest) {
  if (request.item === "garlic" && request.unit !== cart.unit) {
    throw new Error(
      `Unit mismatch for garlic: requested ${request.quantity} ${request.unit}, got ${cart.quantity} ${cart.unit}`
    );
  }
}

Even a dumb validator catches a surprising number of expensive mistakes.

3) Narrow permissions hard

If your agent only needs to search products and build a cart, don’t also give it broad account or payment capabilities.

The principle is simple: allowlist the smallest possible set of actions.

Example config shape:

{
  "allowedTools": [
    "search_products",
    "get_product_details",
    "add_to_cart",
    "view_cart"
  ],
  "blockedTools": [
    "checkout",
    "update_payment_method",
    "change_delivery_address"
  ]
}

Same idea applies to MCP servers, browser permissions, and internal workflows.

4) Watch logs while the workflow matures

Don’t trust a workflow because it worked ten times.

Watch traces. Watch tool calls. Watch retries.

If you’re debugging OpenClaw directly, a command like this is the kind of thing you want running while testing:

openclaw logs --follow

For custom agents, log at least:

tool name
arguments
normalized quantity and unit
price before approval
retry count
final action selected

Example:

{
  "tool": "add_to_cart",
  "item": "garlic",
  "requested_quantity": 2,
  "requested_unit": "head",
  "cart_quantity": 2,
  "cart_unit": "kg",
  "status": "blocked_by_validator"
}

That kind of log turns a meme into a fix.

My take: this was not an OpenClaw problem so much as an automation design problem

OpenClaw did not need to become sentient to create a bad result.

It just needed:

an ambiguous interface
a missing unit normalization step
too much trust
no approval gate at the point of payment

That pattern shows up everywhere.

If you’re building agents that touch money, inventory, infra, or customer data, the winning design is usually not full autonomy.

It’s draft autonomy.

Let the agent do the annoying 90%:

read recipes
extract ingredients
compare products
build the HEB cart
check stock
prepare the purchase

Then put one human checkpoint in front of the irreversible step.

That sounds less futuristic than "my agent buys groceries for me."

It also sounds a lot less like 40 heads of garlic.

If you’re building this stuff for real

If your agents run all day inside OpenClaw, n8n, Make, Zapier, or custom OpenAI-compatible workflows, the biggest production problems are usually not the flashy ones.

They’re things like:

hidden unit mismatches
brittle UI assumptions
missing approval gates
runaway model cost on recurring tasks

That last one is why predictable compute matters.

If your automation works but every loop through a browser session makes you think about token burn, you don’t really have an automation system. You have a meter running.

Standard Compute is worth a look if you want flat monthly pricing for OpenAI-compatible agent workloads instead of per-token cost anxiety. That model makes a lot more sense for recurring automations than paying like every grocery page inspection is a special event.

The garlic thread was funny.

The lesson is not.

Treat agents like junior operators with fast hands, not flawless judgment.

You’ll ship better systems that way.

DEV Community