Give Your AI Agent a Browser: Web Automation via API with IteraTools
Published: [to be published on dev.to — login with GitHub account]
One of the most requested features for AI agents is the ability to actually do things on the web — fill forms, click buttons, extract data from JavaScript-heavy pages, log in to services. Until now, setting up a headless browser was a pain: install Playwright or Puppeteer, manage dependencies, handle concurrency, deal with sandboxing.
IteraTools /browser/act flips this: you send a JSON list of actions, we run a real Chromium browser server-side, and return results. One API call, no browser setup.
How It Works
The endpoint accepts an actions array where each action has a type and relevant params:
curl -X POST https://api.iteratools.com/browser/act \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{
"actions": [
{ "type": "navigate", "url": "https://news.ycombinator.com" },
{ "type": "waitForSelector", "selector": ".titleline" },
{ "type": "evaluate",
"script": "Array.from(document.querySelectorAll(\".titleline a\")).slice(0,5).map(a=>({title: "a.textContent,url:a.href}))\" }"
]
}'
Response:
{
"success": true,
"steps": 3,
"duration_ms": 847,
"results": [
{
"step": 3,
"type": "evaluate",
"value": [
{"title": "Show HN: I built an MCP server for browser automation", "url": "https://..."},
...
]
}
]
}
Supported Actions
| Action | What it does |
|---|---|
navigate |
Go to a URL |
click |
Click an element by CSS selector |
type |
Type text into an input |
press |
Press a keyboard key (Enter, Tab, Escape…) |
wait |
Wait N milliseconds |
waitForSelector |
Wait until an element appears |
extract |
Get text/HTML from a selector |
screenshot |
Capture a PNG screenshot (base64) |
evaluate |
Run JavaScript in the page |
select |
Choose an option in a <select>
|
Up to 20 actions per request, 10 seconds per action.
Real-World Example: AI Agent That Monitors Prices
Here's a Python agent that uses IteraTools to check a product price daily:
import requests
ITERATOOLS_KEY = "it-XXXX-XXXX-XXXX"
def check_price(product_url, price_selector):
resp = requests.post(
"https://api.iteratools.com/browser/act",
headers={"Authorization": f"Bearer {ITERATOOLS_KEY}"},
json={
"actions": [
{"type": "navigate", "url": product_url},
{"type": "waitForSelector", "selector": price_selector},
{"type": "extract", "selector": price_selector}
]
}
)
data = resp.json()
results = data.get("results", [])
if results:
return results[-1].get("text", "N/A")
return None
price = check_price("https://amazon.com/dp/B09XYZ", ".a-price-whole")
print(f"Current price: ${price}")
Cost: $0.005 per check. Running this hourly costs ~$3.60/month.
Combining with Other IteraTools
The real power comes when you combine /browser/act with other tools:
-
Navigate + extract → Send extracted text to
/ttsfor voice readout -
Screenshot → Send the PNG to
/image/ocrto extract text from rendered pages -
Extract data → Pass to
/chart/generatefor instant visualization - Navigate to forms → Complete multi-step workflows fully automated
Why Not Just Use Playwright Directly?
You absolutely can. But if you're building AI agents or microservices that occasionally need to touch a browser:
- No infra to manage — browser runs on our servers
- No dependency hell — no Playwright/Chromium install in your Docker image
- Works anywhere — call it from AWS Lambda, Cloudflare Workers, your LLM tool calls
- Pay as you go — $0.005 per session, no monthly fee
Getting Started
- Create a free API key at iteratools.com
- Add credits (start with $1 = 200 browser sessions)
- Start automating
Full docs: api.iteratools.com/docs
GitHub (MCP server): github.com/fredpsantos33/mcp-iteratools
IteraTools is a multi-tool API for AI agents: 24+ tools including image generation, web search, OCR, TTS, PDF, browser automation, and code execution. Pay-per-use with x402 micropayments.
Tags: ai, webdev, automation, api, llm
Top comments (0)