POST /crawl — BFS Web Crawler
We just shipped POST /crawl to IteraTools — a breadth-first web crawler that extracts structured content from multiple pages in a single API call.
This is particularly useful for AI agents that need to digest entire documentation sites, product catalogs, or any multi-page website.
What it does
Starting from a seed URL, it performs BFS traversal, visiting each page and returning:
- title — the page title
- markdown — full page content as clean markdown (up to 20,000 chars/page)
- links — outbound links found on the page
By default it stays on the same domain, so you won't accidentally crawl the entire internet.
Quick example
curl -X POST https://api.iteratools.com/crawl \
-H "Authorization: Bearer YOUR_KEY" \
-H "Content-Type: application/json" \
-d '{"url":"https://docs.example.com","max_pages":10}'
Response:
{
"ok": true,
"data": {
"pages": [
{
"url": "https://docs.example.com",
"title": "Documentation Home",
"markdown": "# Getting Started\n\nWelcome to the docs...",
"links": ["https://docs.example.com/guide", "https://docs.example.com/api"]
},
{
"url": "https://docs.example.com/guide",
"title": "Getting Started Guide",
"markdown": "## Installation\n\n```
bash\nnpm install example\n
```...",
"links": ["https://docs.example.com/api", "https://docs.example.com/examples"]
}
],
"total": 2,
"crawl_time_ms": 4821
}
}
Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
url |
string | required | Starting URL |
max_pages |
integer | 5 | Max pages to crawl (1–20) |
same_domain |
boolean | true | Only follow same-domain links |
include_pattern |
string | null | Regex: only crawl matching URLs |
exclude_pattern |
string | null | Regex: skip matching URLs |
Filter examples
Crawl only blog posts:
{"url": "https://myblog.com", "max_pages": 20, "include_pattern": "/blog/"}
Skip admin and login pages:
{"url": "https://mysite.com", "max_pages": 10, "exclude_pattern": "/(admin|login|logout)"}
Pricing
$0.010 per job (up to 20 pages included) via x402 micropayment on Base (USDC), or Bearer API key.
That's $0.001–$0.0005 per page crawled — significantly cheaper than most scraping services.
Use cases
- Documentation ingestion — feed your AI agent an entire docs site before answering questions
- Competitive research — extract product pages, pricing, and feature lists from competitor sites
- Content auditing — scan your own site for outdated content or broken structure
-
RAG pipeline seeding — combine with
/embeddingsto build a searchable knowledge base from any website
MCP support
Available in the mcp-iteratools package (v1.0.27+):
{
"mcpServers": {
"iteratools": {
"command": "npx",
"args": ["-y", "mcp-iteratools"],
"env": { "ITERATOOLS_API_KEY": "your-key" }
}
}
}
Try it
- iteratools.com/tools/crawl
- Full API docs
- 64 tools total, all pay-per-use
Top comments (0)