Ebony Louis for Cloudinary

Posted on Jun 12 • Edited on Jun 30

What Makes An Agent Loop Useful?

#ai #webdev #agentloops #agentskills

Over the last few days, my feed has been filled with posts about agent loops.

The message is usually some variation of the same idea: stop prompting agents one step at a time and start designing systems that can act, evaluate their work, and continue toward a goal on their own. The future isn't us prompting agents all day. It's us designing loops that can keep making progress while we're doing something else, or even while we're asleep.

That's old news.

What I didn't understand was which parts of the loop actually mattered.

Was it the model? The tools? The memory? The trigger? The fact that it could run unattended?

Most of the examples I found explained the concept. Few explained how to think about designing one. And even fewer showed what happened when things went wrong.

So instead of reading another post about loops, I decided to build a simple one.

Using Claude Code, Cloudinary Skills, and Cloudinary MCP servers, I created a media optimization loop with a simple objective: reduce delivery weight for image assets by at least 20%.

There was only one question left to answer:

What Makes a Loop Useful?

Not every automation is a loop.

A useful loop needs a few things working together:

If one of those pieces is missing, the loop will start to break down.

Without a trigger, nothing starts. Without a goal, it doesn't know what success looks like. Without knowledge, the agent guesses. Without actions, it can only make recommendations. Without memory, every run starts from scratch. Without verification, it's very difficult to know whether the loop is actually making progress or simply producing output that looks reasonable.

That last point ended up becoming the most important lesson of the entire project. Of the eight pieces in that circle, one short chain running through the middle of it is what actually determines whether the loop succeeds. We'll dive more into that later.

The Goal Layer

Everything started with a simple goal:

Reduce image delivery weight by at least 20%

That objective determined what success looked like and ultimately drove every decision the loop made.

At first that sounds obvious. It wasn't. Later in the project I discovered that a surprisingly large amount of loop behavior comes down to how clearly you define the goal in the first place.

The Knowledge Layer

The agent needed both Cloudinary knowledge and knowledge about how the loop itself should operate.

I didn't want it guessing transformation syntax, trying to remember optimization best practices from training data, or inventing its own workflow every time it ran.

To solve that, I used Cloudinary Skills alongside a custom loop skill.

The Cloudinary Skills acted as the domain knowledge layer. In this case, the transformation focused skill provided guidance on optimization strategies, transformation syntax, and Cloudinary best practices so the agent could make informed decisions without relying on memory or assumptions.

The custom loop skill defined the workflow itself. It told the agent how to read the goal, inspect memory, evaluate results, update state, and decide whether the loop should continue or stop.

Instead of telling the agent exactly what to do at every step, I gave it both reusable Cloudinary knowledge and a reusable workflow. The loop could then reason about the problem, take action, evaluate the outcome, and decide what to do next.

The Action Layer

Knowledge alone isn't enough. If the loop is going to do real work, it needs the ability to interact with your environment. That's where Cloudinary MCP servers came in.

The Asset Management MCP server allowed the loop to discover image assets larger than 1 MB, inspect their metadata, and prioritize the biggest optimization opportunities first.

The Environment Configuration MCP server allowed the loop to create reusable named transformations when it identified optimization patterns that could be applied across multiple assets. Rather than repeatedly generating the same transformation logic, the loop could register and reuse optimization strategies across future runs.

Together, the Skills and MCP servers gave the loop both knowledge and action. The Skills helped the agent reason about what should happen. The MCP servers allowed it to actually do it.

What struck me most wasn't any individual MCP call. It was watching the loop move from reasoning to action. It wasn't simply recommending optimizations anymore. It could inspect my Cloudinary account, identify opportunities, and make changes based on what it found.

The Memory Layer

The loop also needed a way to remember what it had already done. I used a simple memory file that tracked:

Assets already evaluated
Previous run summaries
Named transformations that had been created
Current loop status

This allowed future runs to pick up where earlier runs left off rather than starting from scratch every time. The agent could see what had already been processed and focus only on remaining work.

With the goal, knowledge, action, and memory layers in place, the loop could do real work and remember what it had done. What it couldn't yet do was tell whether that work was any good.

The Evaluation Layer (Where I Got It Wrong)

The first version of the loop looked successful.

It analyzed assets, generated optimization strategies, and reported an average savings of nearly 68%. If I had stopped there, I probably would have considered the project finished.

The problem was that those savings were estimates. The loop wasn't actually verifying its work. It was looking at dimensions, formats, and known optimization patterns and making educated guesses about how much optimization should be possible. The numbers looked reasonable.

Here's what was actually going on. My original loop instructions told the agent to evaluate whether an optimization met the savings target, but never specified how. So the agent did what any reasonable agent would do when given an ambiguous instruction: it estimated. It knew the typical savings associated with common optimization strategies and used that knowledge to produce a confident sounding number.

The fix wasn't a smarter model. It was a more precise instruction.

I added one section to the loop command:

## Step 8 — Evaluate results

For each asset processed, determine:
- Estimated or measured optimized size (bytes)
- Percent savings = (original - optimized) / original × 100
- Did this asset meet the target savings threshold?
- Confidence level: high (measured), medium (estimated from known asset type),
  low (estimated with significant uncertainty)

That's it. I didn't tell the agent how to measure anything. I just told it that "estimated" and "measured" were different things, and that it needed to label which one it was giving me.

That single instruction changed the entire run. The agent started actually fetching the optimized URLs and comparing real byte sizes against the originals, because it now had to be honest about which category its number fell into.

Verification Changed Everything

Here's the same batch of assets, estimated savings versus measured savings, after I added the evaluation step:

Almost nothing landed where the estimate said it would.

The kitten GIF is the headline number: predicted at 70%, measured at 0%. I dug into it and found the test request wasn't negotiating WebP correctly, which caused Cloudinary's automatic format selection to fall back to GIF delivery. The optimization strategy wasn't wrong. The measurement exposed a gap between what should happen and what actually happened on that request.

But the GIF isn't the real story. Look at food/spices: estimated at 30%, measured at 61.3%. Look at breakfast: estimated at 53%, measured at 77.3%. These weren't close. The agent's estimates were off by 5 to 31 percentage points, in both directions, across almost every asset in the batch.

That's the part that actually changed how I think about this. The estimates weren't imprecise. They were unreliable, and not in a way you could correct for with a fudge factor, because sometimes the real number was way better than predicted and sometimes it was worse. Without verification, there was no way to know which assets fell into which bucket. The loop would have reported a single confident average and moved on, and I would have believed it.

With the Step 8 instruction in place, the loop now generates optimized Cloudinary URLs, requests the transformed assets, measures the actual delivered byte sizes, and compares those against the originals, flagging low confidence results so I know where to look first.

The Loop Was Right. I Was Wrong.

After the first run with real measurements, the loop reported that seven eligible assets still remained.

My first reaction was, why didn't it keep going?

I went back and looked at the goal. The goal wasn't:

Optimize every asset larger than 1MB.

It was:

Reduce delivery weight by 20%.

Those sound similar. They produce completely different behavior.

By the time the loop stopped, the measured average savings across the assets it had processed was already well past 20%. The goal had been met. From the loop's perspective, the work was done. Processing the remaining seven assets wouldn't have made the result any more true.

The loop wasn't confused. I was.

I had written down one goal and was carrying around a different one in my head. The loop followed the one that was actually written down, the one it could check against a real number. It stopped exactly when it was supposed to.

A loop will faithfully pursue the goal you define, not the goal you intended.

The Chain That Was Actually Doing the Work

Looking back, the GIF that measured 0% and the seven assets the loop left alone are the same lesson told twice.

Earlier I showed the eight pieces a useful loop needs: trigger, goal, knowledge, action, evaluation, memory, decision, repeat. That's the architecture, the parts you need to assemble. But it's not the part that actually drives progress toward the goal. That part is smaller:

If the eight pieces is the architecture, this is the engine. Everything else in this project, the trigger, the Skills, the MCP servers, the memory file, exists to support this chain. But this is the part that makes something a loop instead of a script that runs once and reports back.

A script can do the Action step. Only a loop can do Measure → Verify → Decide, and use the answer to decide whether to repeat. The measurement gave the loop a number in the same units as the goal. Verification was the comparison between that number and the target. Decision was what the loop did with the answer, stop, because the goal was met, even though work remained.

Take away the measurement, and the loop has nothing to verify. Take away verification, and the decision is just a guess wearing a confident tone. Take away the decision, and you have an agent that acts but never knows when to stop.

The Loop, In Practice

Stepping back, here's what each layer actually was in this project:

Trigger: a command I run manually in Claude Code
Goal: a markdown file stating the target savings percentage
Knowledge: Cloudinary Skills and a custom loop skill, so the agent didn't have to guess either Cloudinary best practices or the workflow itself
Action: Cloudinary MCP servers, so the agent could inspect and modify a real account
Memory: a JSON file tracking what's been done and what's left
Evaluation: a comparison between the original and optimized byte sizes, measured against the goal
Decision: the agent comparing the evaluation result to the goal and deciding to stop, continue, or flag for review

None of these layers are exotic. A markdown file, a JSON file, and two categories of tooling Cloudinary already publishes. What made it a loop wasn't the sophistication of any single piece, it was that each layer fed the next one something real, and that the chain in the middle, Measure → Verify → Decide, was actually wired up.

What I Didn't Build (Yet)

This is intentionally a small loop. It doesn't use subagents, worktrees, or orchestration frameworks.

Those things help loops scale, run faster, or handle more complexity, but they aren't what make something a loop in the first place.

The trigger is also still manual. I start the loop by running a command in Claude Code, and in a production version I'd likely replace that with a schedule or an event such as a new asset upload.

In other words, this isn't yet the "run while I sleep" version of a loop that many people are talking about. It's a deliberately small implementation designed to answer a simpler question:

What actually matters when you're designing a loop?

If you'd like to explore the implementation yourself, I've shared the files used in this experiment:

Closing Thoughts

Going into this project, I wanted to understand what actually matters when you're designing an agent loop.

I thought the answer might be the model, the tools, the memory layer, or the fact that the loop could run unattended.

What I learned was that those things matter, but mostly because they support something else.

A useful loop needs a clear goal, a way to measure progress against that goal, and a way to decide what to do next based on the result.

The GIF that measured 0% savings and the seven assets the loop left untouched both pointed to the same lesson: a loop is only as useful as its ability to determine whether it's actually making progress toward the objective it was given.

Everything else, including tools, MCP servers, Skills, subagents, and orchestration, helps a loop scale. What makes it useful is its ability to measure progress against a goal and use that information to decide what to do next.

Cloudinary ❤️ developers
Ready to level up your media workflow? Start using Cloudinary for free and build better visual experiences today.
👉 Create your free account

Top comments (3)

Eleftheria Batsou • Jun 14

Thanks for this.

The piece I'd add: a loop is only as safe as the environment it runs in unattended. "Act, evaluate, continue" is great until the act step has unrestricted reach and the evaluate step misses something. The loops that actually run overnight without supervision are the ones where the worst-case action is bounded by what the environment allows, not by the loop's own judgment.

Ebony Louis Cloudinary • Jun 15

That's a really great point!

This loop was relatively low risk since the actions were mostly inspecting assets and creating transformations. But the more powerful the action layer gets, the more important those guardrails become.

So I 100% agree, I like the way you phrased it, the worst-case outcome should be bounded by the environment, not by the loop's own judgment.

Some comments may only be visible to logged-in visitors. Sign in to view all comments.