Give Your AI Agent a Browser: Web Automation via API with IteraTools

#ai #webdev #api #automation

Give Your AI Agent a Browser: Web Automation via API with IteraTools

Published: [to be published on dev.to — login with GitHub account]

One of the most requested features for AI agents is the ability to actually do things on the web — fill forms, click buttons, extract data from JavaScript-heavy pages, log in to services. Until now, setting up a headless browser was a pain: install Playwright or Puppeteer, manage dependencies, handle concurrency, deal with sandboxing.

IteraTools /browser/act flips this: you send a JSON list of actions, we run a real Chromium browser server-side, and return results. One API call, no browser setup.

How It Works

The endpoint accepts an actions array where each action has a type and relevant params:

curl -X POST https://api.iteratools.com/browser/act \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{
    "actions": [
      { "type": "navigate", "url": "https://news.ycombinator.com" },
      { "type": "waitForSelector", "selector": ".titleline" },
      { "type": "evaluate",
        "script": "Array.from(document.querySelectorAll(\".titleline a\")).slice(0,5).map(a=>({title: "a.textContent,url:a.href}))\" }"
    ]
  }'

Response:

{
  "success": true,
  "steps": 3,
  "duration_ms": 847,
  "results": [
    {
      "step": 3,
      "type": "evaluate",
      "value": [
        {"title": "Show HN: I built an MCP server for browser automation", "url": "https://..."},
        ...
      ]
    }
  ]
}

Supported Actions

Action	What it does
`navigate`	Go to a URL
`click`	Click an element by CSS selector
`type`	Type text into an input
`press`	Press a keyboard key (Enter, Tab, Escape…)
`wait`	Wait N milliseconds
`waitForSelector`	Wait until an element appears
`extract`	Get text/HTML from a selector
`screenshot`	Capture a PNG screenshot (base64)
`evaluate`	Run JavaScript in the page
`select`	Choose an option in a `<select>`

Up to 20 actions per request, 10 seconds per action.

Real-World Example: AI Agent That Monitors Prices

Here's a Python agent that uses IteraTools to check a product price daily:

import requests

ITERATOOLS_KEY = "it-XXXX-XXXX-XXXX"

def check_price(product_url, price_selector):
    resp = requests.post(
        "https://api.iteratools.com/browser/act",
        headers={"Authorization": f"Bearer {ITERATOOLS_KEY}"},
        json={
            "actions": [
                {"type": "navigate", "url": product_url},
                {"type": "waitForSelector", "selector": price_selector},
                {"type": "extract", "selector": price_selector}
            ]
        }
    )
    data = resp.json()
    results = data.get("results", [])
    if results:
        return results[-1].get("text", "N/A")
    return None

price = check_price("https://amazon.com/dp/B09XYZ", ".a-price-whole")
print(f"Current price: ${price}")

Cost: $0.005 per check. Running this hourly costs ~$3.60/month.

Combining with Other IteraTools

The real power comes when you combine /browser/act with other tools:

Navigate + extract → Send extracted text to /tts for voice readout
Screenshot → Send the PNG to /image/ocr to extract text from rendered pages
Extract data → Pass to /chart/generate for instant visualization
Navigate to forms → Complete multi-step workflows fully automated

Why Not Just Use Playwright Directly?

You absolutely can. But if you're building AI agents or microservices that occasionally need to touch a browser:

No infra to manage — browser runs on our servers
No dependency hell — no Playwright/Chromium install in your Docker image
Works anywhere — call it from AWS Lambda, Cloudflare Workers, your LLM tool calls
Pay as you go — $0.005 per session, no monthly fee

Getting Started

Create a free API key at iteratools.com
Add credits (start with $1 = 200 browser sessions)
Start automating

Full docs: api.iteratools.com/docs
GitHub (MCP server): github.com/fredpsantos33/mcp-iteratools

IteraTools is a multi-tool API for AI agents: 24+ tools including image generation, web search, OCR, TTS, PDF, browser automation, and code execution. Pay-per-use with x402 micropayments.

Tags: ai, webdev, automation, api, llm