João Pedro Silva Setas

Posted on Mar 19

Most AI agent demos optimize for capability. Production buyers pay for control.

#agents #ai #automation #softwareengineering

Every week I see a new AI agent demo.

Book the meeting. Send the email. Refactor the code. Triage the ticket. Trade the stock. Run the company.

The demos are getting better. Some of them are genuinely impressive.

But most of them are optimized for the wrong buyer.

They are optimized for the person watching the demo, not for the person who has to run the system after the demo.

That second person usually cares about different questions.

What can this thing access?
What happens when it gets stuck?
How do I approve risky actions?
What did it actually do?
How do I stop it?
How do I roll it back?
Which secrets can it touch?
How do I explain its behavior to my team?

That is the real product surface.

Not just capability. Control.

Capability gets the screenshot. Control gets the budget.

I think a lot of the current agent market is repeating a familiar pattern.
The first wave proves that the interaction is possible. You show that an LLM can use tools, keep context, and complete a multi-step task. That gets attention fast because it feels new.

Then reality shows up.

The agent fails halfway through a workflow. Or it retries a step six times and burns API credits. Or it drafts something a human should have reviewed first. Or it keeps running after the useful part is already done. Or it touches a system that should have been off-limits.

At that point, the question changes.

It is no longer, "Can this agent do the task?"

It becomes, "Can I trust this system in an environment that matters?"

That is where most demos stop.

Observability is necessary. It is not the whole answer.

A lot of products respond to this by adding better visibility.

You get traces, timelines, logs, token counts, screenshots, and event streams. I like all of that. You need it.

But observability on its own is still passive.

It tells you what happened.

Production users usually need more than that. They need ways to shape what is allowed to happen in the first place.

Watching an agent fail in high resolution is still failure.

The control plane is the part that turns visibility into operational trust.

The control-plane primitives I think matter

If I were evaluating an agent platform for real work, these are the things I would care about first.

1. Narrow permissions by default

An agent should not wake up with broad access to everything.

It should have access to exactly the tools, environments, and credentials required for the job. Nothing more.

If the task is reading support tickets, it does not also need production deploy access.

If the task is drafting copy, it does not also need billing permissions.

The default should be small blast radius.

2. Review points for expensive or risky actions

The most important feature in an autonomous system is often a well-placed pause.

Some actions should be automatic. Some should require a human checkpoint.

That could mean spending above a threshold, writing to production systems, touching customer data, or sending something externally.

I do not see human review as a weakness in the product. I see it as part of the product.

3. Auditability that is actually useful

I want more than a generic activity log.

I want to know which tool was called, under which boundaries, and what happened next.

If something goes wrong, I should be able to reconstruct the path without guessing.

That matters for debugging. It also matters for trust inside a team.

4. Recovery and rollback strategy

People talk a lot about autonomous execution. They talk less about undo.

But if an agent edits configuration, changes data, triggers a workflow, or mutates state, rollback matters.

The system should not just be able to move forward. It should help me recover from a bad step without turning the whole incident into manual archaeology.

5. Credential boundaries

This one is boring, which is exactly why it matters.

Credentials should be isolated by environment, role, and task. Temporary access is better than broad standing access. Fine-grained scopes are better than one giant shared credential.

The more agentic the workflow becomes, the more this matters.

6. Observability tied to action

Yes, I still want traces and telemetry.

But I want them connected to intervention. When I see a loop, I should be able to stop it. When I see cost drift, I should be able to tighten a boundary. When I see a repeated failure, I should be able to change how the runtime behaves.

Good observability should make intervention simpler, not just diagnosis prettier.

This is why I think the category is really about trust

I do not think production users are buying "an agent" in the abstract.

They are buying a system they can trust around an agent.

That trust does not come from a benchmark.

It comes from constraints.

It comes from knowing that the runtime has boundaries. That risky actions can be reviewed. That behavior can be inspected. That failures can be contained. That humans can step in cleanly.

The winning products in this space will probably look less magical over time, not more.

They will feel more operational. More boring. More inspectable.

That is a good sign.

In infrastructure, boring is often what people pay for.

How I would position OpenClawCloud

This is the direction I find most interesting for OpenClawCloud.

Not "host your agents in the cloud" as a generic message.

That is too weak.

The stronger message is closer to this:

OpenClawCloud should be for teams that do not just want agents that can act. They want agents they can supervise.

The value is not raw autonomy.

The value is a managed runtime built around operational trust.

If I am a small team, I probably do not want to assemble review points, action history, credential isolation, recovery strategy, and runtime visibility from scratch around every agent workflow.

I want those concerns handled in one place.

That is the real operational burden.

And it is where I think the product story gets much stronger.