DEV Community

Cover image for Using AlterLab as an MCP Tool to Feed Live Web Data into AI Agent Workflows
AlterLab
AlterLab

Posted on • Originally published at alterlab.io

Using AlterLab as an MCP Tool to Feed Live Web Data into AI Agent Workflows

TL;DR

Use AlterLab as a Model Context Protocol tool to provide live web data to AI agents. The API handles anti-bot measures, proxies, and output formatting, letting your agent focus on reasoning rather than scraping mechanics.

Why Connect a Scraping API to MCP

AI agents often need current information that exceeds their training data cutoff. Instead of rebuilding a scraping pipeline for each use case, you can expose a reliable web scraping service as an MCP tool. AlterLab delivers clean HTML, JSON, or Markdown from any public page while managing rotating proxies, automatic retries, and challenge resolution. This reduces the operational burden on agent developers and improves data freshness.

Architecture Overview

The MCP tool consists of three parts: a thin wrapper that translates MCP requests into AlterLab calls, the AlterLab API itself, and the agent that consumes the returned data. When the agent asks for live data, the wrapper sends a POST request to AlterLab’s /v1/scrape endpoint with the target URL and desired output format. AlterLab returns the page content, which the wrapper forwards to the agent as part of the MCP response.

Setting Up the AlterLab MCP Tool

First, obtain an API key from AlterLab’s dashboard. The wrapper below shows a minimal Python implementation that can be registered with any MCP host. It accepts a URL and optional format, calls AlterLab, and returns the result.

```python title="alterlab_mcp.py" {6-12}

ALTERLAB_ENDPOINT = "https://api.alterlab.io/v1/scrape"

def scrape_url(api_key: str, url: str, fmt: str = "json") -> dict:
data = json.dumps({"url": url, "formats": [fmt]}).encode("utf-8")
req = urllib.request.AlterlabRequest(
ALTERLAB_ENDPOINT,
data=data,
headers={
"X-API-Key": api_key,
"Content-Type": "application/json",
},
method="POST",
)
try:
with urllib.request.urlopen(req, timeout=30) as resp:
body = resp.read().decode("utf-8")
return json.loads(body)
except urllib.error.HTTPError as exc:
return {"error": f"HTTP {exc.code}", "details": exc.read().decode()}
except urllib.error.URLError as exc:
return {"error": "Network error", "details": str(exc)}

Example MCP handler signature (pseudo‑code)

def mcp_tool_handler(request):
params = request.get("params", {})
url = params.get("url")
fmt = params.get("format", "json")
if not url:
return {"error": "Missing url parameter"}
result = scrape_url("YOUR_API_KEY", url, fmt)
return {"content": result}




The wrapper highlights two key lines: building the JSON payload and sending the POST request with the API key header. Errors are caught and returned as structured dictionaries so the MCP host can surface them to the agent.

## Calling the Tool from an Agent
Once the wrapper is deployed as an MCP tool, an agent can invoke it using natural language or a structured prompt. Below is a bash example showing how a developer might test the endpoint directly with curl before integrating it into an agent framework.



```bash title="Terminal" {3-6}
curl -X POST https://api.alterlab.io/v1/scrape \
  -H "X-API-Key: YOUR_API_KEY" \
  -H "Content-Type: application/json" \
  -d '{"url": "https://example.com/data", "formats": ["json"]}'
Enter fullscreen mode Exit fullscreen mode

The response returns a JSON object containing the scraped page under the text key (or markdown if requested). Your agent can then parse this payload and incorporate it into its reasoning loop.

Data Formats for LLMs

AlterLab offers several output formats that map cleanly to LLM inputs:

  • JSON: Ideal when you need structured fields; the API can extract tables, lists, or custom JSON schemas via Cortex AI.
  • Markdown: Preserves headings, lists, and code blocks in a readable text format that many LLMs process well.
  • Text: Strips HTML tags, yielding plain text for simple prompts.

Selecting the right format reduces post‑processing steps. For example, if your agent needs to summarize a news article, requesting Markdown preserves hierarchy while stripping unnecessary tags.

Error Handling and Retries

Network interruptions or temporary blocks are common in web scraping. AlterLab automatically retries failed requests with exponential backoff and rotates proxies on each attempt. The MCP wrapper should respect the HTTP status codes returned: a 429 indicates rate limiting, while a 5xx suggests a transient issue. In both cases, the agent can retry after a short delay or notify a human operator.

Cost Considerations

AlterLab operates on a pay‑as‑you-go model where you pay per successful scrape. Since the MCP tool only calls the API when the agent explicitly requests data, you avoid idle costs. Review the pricing page to estimate monthly expenses based on expected request volume and average response size.

Security and Compliance

Only scrape content that is publicly accessible and permitted by the site’s terms of service. AlterLab does not bypass authentication gates or paywalls; it returns whatever a standard browser would see for an unauthenticated user. Keep API keys secret and rotate them regularly. The MCP wrapper should never log the full key; instead, reference it from an environment variable or secret manager.

Internal Links

For a faster start, check out the Python SDK which includes a pre‑built client class handling authentication and retries. See the API docs for full endpoint details, including supported output formats and webhook configuration.

Takeaway

Integrating AlterLab as an MCP tool gives AI agents reliable, up‑to‑date web data without the engineering overhead of building and maintaining a scraping infrastructure. The API’s automatic anti‑bot handling, flexible output formats, and usage‑based pricing let agents focus on reasoning while the scraping layer works in the background.

Start by obtaining an API key, deploying a thin wrapper like the example above, and registering it with your MCP host. Your agents will then be able to fetch live data on demand, improving the relevance and accuracy of their outputs.

Top comments (0)