DEV Community

Leo
Leo

Posted on • Originally published at cicd.deployment.to

Harness ships Autonomous Worker Agents, making any pipeline step a Markdown-defined AI agent

The first time I let a scripted CI step do something that felt "smart", I regretted it the same afternoon. It was a shell block that inferred a version bump from a commit message and then tagged the wrong branch. Nothing rolled back cleanly. The blame was mine, honestly, because the script had no identity, no sandbox, no policy check, and no audit trail beyond a build log I had to grep after the fact. That memory is the lens I read this Harness announcement through.

What Harness actually shipped

Harness launched Autonomous Worker Agents on Tuesday. The headline mechanic is simple. Any step in a Harness delivery pipeline that used to be a fixed script can now run as an AI agent instead. Deployment, testing, and security scans are the examples the company leads with.

Harness has had "expert agents" available for a while, the ones that live in a chat window or IDE and help you author pipelines. This new tier moves the agent inside the pipeline. It runs as a step, on the same delegate infrastructure Harness has been running scripted steps on for years, and it is governed by the same audit and policy controls those companies already use for humans and scripts.

CEO and founder Jyoti Bansal told The New Stack that "building an agent is becoming easier and easier and easier, but the harness is where the hard work is." That is roughly the pitch. The model is the cheap part. The runtime around it is not.

The agent-file trick

The part that made me sit up: agents are defined as a single Markdown file. You write the instructions in plain English. Harness calls the shape an "agent-file format that has become standard across the industry", and if a team does not want to author the file by hand, Harness AI will generate one. Once the agent is registered, it draws on the Harness Software Delivery Knowledge Graph, the company's internal map of a customer's services, pipelines, deployments, incidents, and security findings.

Whether "standard across the industry" survives contact with other vendors is a separate question, and I will come back to it below. As a DX affordance, though, "a pipeline step is now a paragraph of English in a Markdown file" is a real change in how a pipeline is authored. If you have ever tried to explain a fifty-line YAML matrix to a new hire, you already know why.

The runtime around each agent

The important bit for anyone running production pipelines is the runtime shape:

  • Where it runs. Agents run on delegates, the Harness components that live inside a customer's own infrastructure, not inside the Harness cloud.
  • How it is isolated. Each agent runs in a sandboxed container with restricted file and network access.
  • Who it is. Each agent gets its own identity and permissions.
  • What can stop it. The same policy engine that gates human deployments gates agent deployments.

Auditability is baked in. An agent step records what triggered it, the prompt it ran on, how many turns it took, and what it produced. Harness already logged that shape of information for scripted steps, so an agent becomes just another logged action on the pipeline timeline. Several startups sell agent auditability as a standalone product; Harness is trying to make that a feature of the platform for its own customers.

Token budgets as a first-class control

The other piece I appreciated is that token budgeting is part of the primitive, not an afterthought. Worker Agents track token spend per agent and per pipeline, and budget caps can pause a run for approval before it "gets out of hand", to use Bansal's phrase. Anyone who has watched an autonomous loop happily burn through a monthly budget over a weekend will read that sentence with a small sigh of relief. It is not a solved problem, but the shape is right. The layer that gates policy and audit also gates spend.

The marketplace and its trust gradient

Harness paired the framework with an Agent Marketplace that opens with dozens of prebuilt agents in three tiers: Harness Managed agents that Harness builds and backs with an SLA, Harness Certified agents built by partners and reviewed by Harness, and community agents anyone can publish. Every agent can be forked. Community agents are open source, which means a team can read one before it runs and adjust it to fit.

Bansal is candid about the community tier. Take a community agent, he says, "with a grain of salt". For a large enterprise, he argues, the gate matters more than the catalog: a company can approve a small set of managed agents for production and block the rest.

How other pipeline platforms are approaching the same problem

Harness is not alone here, and the announcement itself calls out two peers. GitHub shipped Agentic Workflows, which runs Markdown-defined agents inside GitHub Actions, in preview earlier this year, and it uses the same agent-file convention Harness uses. GitLab's Duo Agent Platform lets teams build their own agents across the software lifecycle, a broader remit than the step-level primitive Harness is shipping. Beyond those two, coding-agent vendors are pushing downstream from the editor into build and deploy.

What matters for a team choosing today is which side of that trust line you are already on, and how much of your policy, identity, and audit surface already lives with the platform you would have to trust to run these agents against production. If you already run a scripted delivery platform end to end, agents-as-steps is a small step. If your agents currently live in an editor and you are considering pushing them further right, the trust bar climbs quickly.

What I am watching next

Two things. First, whether the "agent-file format" claim actually stabilises into a portable convention or whether every vendor's Markdown file quietly drifts into vendor-specific frontmatter and tool wiring. Portability is where DX either wins or dies over the next year. Second, whether the token-budget and audit primitives get exercised in anger before they get relaxed for convenience. It is easy to ship pause-for-approval on day one and quiet the pause a quarter later once engineers start routing around it. If Harness holds that line, the "harness" name will have earned its second decade.

Top comments (0)