DEV Community

Fred Santos
Fred Santos

Posted on

Give Your AI Agent a Browser: Web Automation via API with IteraTools

Give Your AI Agent a Browser: Web Automation via API with IteraTools

Published: [to be published on dev.to — login with GitHub account]


One of the most requested features for AI agents is the ability to actually do things on the web — fill forms, click buttons, extract data from JavaScript-heavy pages, log in to services. Until now, setting up a headless browser was a pain: install Playwright or Puppeteer, manage dependencies, handle concurrency, deal with sandboxing.

IteraTools /browser/act flips this: you send a JSON list of actions, we run a real Chromium browser server-side, and return results. One API call, no browser setup.

How It Works

The endpoint accepts an actions array where each action has a type and relevant params:

curl -X POST https://api.iteratools.com/browser/act \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "actions": [
      { "type": "navigate", "url": "https://news.ycombinator.com" },
      { "type": "waitForSelector", "selector": ".titleline" },
      { "type": "evaluate",
        "script": "Array.from(document.querySelectorAll(\".titleline a\")).slice(0,5).map(a=>({title: "a.textContent,url:a.href}))\" }"
    ]
  }'
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "success": true,
  "steps": 3,
  "duration_ms": 847,
  "results": [
    {
      "step": 3,
      "type": "evaluate",
      "value": [
        {"title": "Show HN: I built an MCP server for browser automation", "url": "https://..."},
        ...
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

Supported Actions

Action What it does
navigate Go to a URL
click Click an element by CSS selector
type Type text into an input
press Press a keyboard key (Enter, Tab, Escape…)
wait Wait N milliseconds
waitForSelector Wait until an element appears
extract Get text/HTML from a selector
screenshot Capture a PNG screenshot (base64)
evaluate Run JavaScript in the page
select Choose an option in a <select>

Up to 20 actions per request, 10 seconds per action.

Real-World Example: AI Agent That Monitors Prices

Here's a Python agent that uses IteraTools to check a product price daily:

import requests

ITERATOOLS_KEY = "it-XXXX-XXXX-XXXX"

def check_price(product_url, price_selector):
    resp = requests.post(
        "https://api.iteratools.com/browser/act",
        headers={"Authorization": f"Bearer {ITERATOOLS_KEY}"},
        json={
            "actions": [
                {"type": "navigate", "url": product_url},
                {"type": "waitForSelector", "selector": price_selector},
                {"type": "extract", "selector": price_selector}
            ]
        }
    )
    data = resp.json()
    results = data.get("results", [])
    if results:
        return results[-1].get("text", "N/A")
    return None

price = check_price("https://amazon.com/dp/B09XYZ", ".a-price-whole")
print(f"Current price: ${price}")
Enter fullscreen mode Exit fullscreen mode

Cost: $0.005 per check. Running this hourly costs ~$3.60/month.

Combining with Other IteraTools

The real power comes when you combine /browser/act with other tools:

  1. Navigate + extract → Send extracted text to /tts for voice readout
  2. Screenshot → Send the PNG to /image/ocr to extract text from rendered pages
  3. Extract data → Pass to /chart/generate for instant visualization
  4. Navigate to forms → Complete multi-step workflows fully automated

Why Not Just Use Playwright Directly?

You absolutely can. But if you're building AI agents or microservices that occasionally need to touch a browser:

  • No infra to manage — browser runs on our servers
  • No dependency hell — no Playwright/Chromium install in your Docker image
  • Works anywhere — call it from AWS Lambda, Cloudflare Workers, your LLM tool calls
  • Pay as you go — $0.005 per session, no monthly fee

Getting Started

  1. Create a free API key at iteratools.com
  2. Add credits (start with $1 = 200 browser sessions)
  3. Start automating

Full docs: api.iteratools.com/docs
GitHub (MCP server): github.com/fredpsantos33/mcp-iteratools


IteraTools is a multi-tool API for AI agents: 24+ tools including image generation, web search, OCR, TTS, PDF, browser automation, and code execution. Pay-per-use with x402 micropayments.

Tags: ai, webdev, automation, api, llm

Top comments (0)