DEV Community

Cover image for Browser Automation Built for Agents
Nikola Balic for Steel

Posted on • Originally published at steel.dev

Browser Automation Built for Agents

Today we're launching the new Steel CLI and steel-browser skill, which brings browser automation redesigned from the ground up for agents.

Agents handle code, reasoning, and tool use well enough. Then they hit a real website and everything falls apart. Login walls, dynamic content, anti-bot systems, session state that doesn't persist. A five-minute human task becomes a twenty-minute debugging session.

This release targets that gap. It's also our first serious implementation of agent experience (AX): building tools where agents get clear inputs, predictable outputs, and failures they can recover from.

TL;DR

  • New agent skill and CLI
  • agent-browser integration
  • Stealth: captcha solving & proxies for agents
  • Run background browser sessions in parallel

What this actually does

The skill and CLI work together to make web tasks agents can finish reliably.

  • Run multi-step flows through login and dynamic UI
  • Pull clean markdown from cluttered pages
  • Capture screenshots and PDFs as evidence
  • Handle anti-bot measures and CAPTCHAs
  • Maintain session state across longer runs
  • Return structured outcomes you can verify

A working agent web run should be boring. Predictable, reviewable, debuggable. Agent starts a browser session via CLI, follows the skill contract, executes commands (open, snapshot, click, fill, type, wait), collects artifacts, returns status.

That's it. No heroics required.

How to install Steel CLI and Skill

SKILL.md: A contract, not a prompt

SKILL.md is a specification that tells an agent how to use a capability.

For web tasks, it tells the agent:

  • When to invoke the skill
  • How to execute the workflow
  • What shape the output takes
  • How to handle blockers

The goal is simple: less prompt glue, more repeatable behavior. Instead of rebuilding web handling for every project, your team adopts one stable path that carries across agents and harnesses.

What the steel-browser skill handles

This steel-browser skill is designed for autonomous web tasks where basic fetch tools fall short, often failing across many sites due to blocking, complexity, and other limitations.

Anti-bot and CAPTCHA flow

When automation is blocked, agent can use these patterns:

steel browser start --session checkout-bot-check --session-solve-captcha
steel browser open https://example.com
steel browser captcha solve --session checkout-bot-check
Enter fullscreen mode Exit fullscreen mode

For automation-first sessions:

steel browser start --session checkout-bot-check --stealth
Enter fullscreen mode Exit fullscreen mode

This is the difference between an agent that reports what it can't do and one that finishes the job.

The new Steel Browser CLI workflow

The CLI is the operator layer. It handles session lifecycle and gives agents and humans a clean interface for browsing work.

Session lifecycle, cloud or local

The CLI manages sessions and gives agents and humans a clean interface.

  • steel browser start creates or attaches a named session
  • steel browser live prints the live view URL for the active session
  • steel browser sessions lists sessions as JSON for scripting and agent loops
  • steel browser stop stops the active session
  • steel browser stop --all stops every session
  • steel browser start --api-url connects through a self-hosted endpoint

Steel session lifecycle, cloud or local

Passthrough commands for agent browser actions

The steel browser command forwards inherited agent-browser actions with a command prefix swap, including open, snapshot, fill, click, and wait.

Local runtime now lives under steel dev

Local runtime orchestration is now explicit and separate:

  • steel dev install — install dependencies
  • steel dev start — start local runtime
  • steel dev stop — stop it

This makes it easier to develop and debug workflows locally, then run the same shape of workflow in the cloud.

Get started

# Install and auth
npm i -g @steel-dev/cli
steel login
Enter fullscreen mode Exit fullscreen mode

Full command reference: Steel CLI docs.

Install the skill

Available in the CLI repo's skills package:

npx skills add steel-dev/cli --skill steel-browser
Enter fullscreen mode Exit fullscreen mode

Or via skills.sh.

Where this helps

This is not about browsing for its own sake. It is about unlocking agents and workflows where the web is the source of truth.

  • Competitive research with verifiable screenshots
  • Lead enrichment from JavaScript-heavy sites
  • Bug reproduction with session recordings
  • QA testing on live interfaces
  • Compliance documentation captured as PDFs
  • Data extraction from gated portals

Give your agent a real browser

Run one real workflow. Tell us what worked and what didn't.

Join our Discord and share your experience with the new skill and CLI.

Top comments (0)