DEV Community

Fred Santos
Fred Santos

Posted on

New IteraTools Endpoint: POST /crawl — BFS Web Crawler for AI Agents

POST /crawl — BFS Web Crawler

We just shipped POST /crawl to IteraTools — a breadth-first web crawler that extracts structured content from multiple pages in a single API call.

This is particularly useful for AI agents that need to digest entire documentation sites, product catalogs, or any multi-page website.


What it does

Starting from a seed URL, it performs BFS traversal, visiting each page and returning:

  • title — the page title
  • markdown — full page content as clean markdown (up to 20,000 chars/page)
  • links — outbound links found on the page

By default it stays on the same domain, so you won't accidentally crawl the entire internet.


Quick example

curl -X POST https://api.iteratools.com/crawl \
  -H "Authorization: Bearer YOUR_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url":"https://docs.example.com","max_pages":10}'
Enter fullscreen mode Exit fullscreen mode

Response:

{
  "ok": true,
  "data": {
    "pages": [
      {
        "url": "https://docs.example.com",
        "title": "Documentation Home",
        "markdown": "# Getting Started\n\nWelcome to the docs...",
        "links": ["https://docs.example.com/guide", "https://docs.example.com/api"]
      },
      {
        "url": "https://docs.example.com/guide",
        "title": "Getting Started Guide",
        "markdown": "## Installation\n\n```

bash\nnpm install example\n

```...",
        "links": ["https://docs.example.com/api", "https://docs.example.com/examples"]
      }
    ],
    "total": 2,
    "crawl_time_ms": 4821
  }
}
Enter fullscreen mode Exit fullscreen mode

Parameters

Parameter Type Default Description
url string required Starting URL
max_pages integer 5 Max pages to crawl (1–20)
same_domain boolean true Only follow same-domain links
include_pattern string null Regex: only crawl matching URLs
exclude_pattern string null Regex: skip matching URLs

Filter examples

Crawl only blog posts:

{"url": "https://myblog.com", "max_pages": 20, "include_pattern": "/blog/"}
Enter fullscreen mode Exit fullscreen mode

Skip admin and login pages:

{"url": "https://mysite.com", "max_pages": 10, "exclude_pattern": "/(admin|login|logout)"}
Enter fullscreen mode Exit fullscreen mode

Pricing

$0.010 per job (up to 20 pages included) via x402 micropayment on Base (USDC), or Bearer API key.

That's $0.001–$0.0005 per page crawled — significantly cheaper than most scraping services.


Use cases

  • Documentation ingestion — feed your AI agent an entire docs site before answering questions
  • Competitive research — extract product pages, pricing, and feature lists from competitor sites
  • Content auditing — scan your own site for outdated content or broken structure
  • RAG pipeline seeding — combine with /embeddings to build a searchable knowledge base from any website

MCP support

Available in the mcp-iteratools package (v1.0.27+):

{
  "mcpServers": {
    "iteratools": {
      "command": "npx",
      "args": ["-y", "mcp-iteratools"],
      "env": { "ITERATOOLS_API_KEY": "your-key" }
    }
  }
}
Enter fullscreen mode Exit fullscreen mode

Try it

Top comments (0)