DEV Community

Naxir
Naxir

Posted on

Why AI agents shouldn't scrape websites and what to do instead

Every AI agent that browses a website today receives the same thing a human does: HTML.

Navigation bars. Cookie consent modals. Ad banners. Dropdown menus. The agent has to parse all of that noise, figure out which DOM elements represent actual actions, and hope the structure doesn't change next week.

This is the wrong abstraction. The data and operations the agent needs are already on the server — they're just wrapped in presentation logic meant for human eyes.

The problem, concretely

Say an agent needs to order food from a delivery app. Here's what it gets today:

<div class="restaurant-card" data-id="r-182">
  <h2 class="name">Pizza Palace</h2>
  <button class="cta" onclick="addToCart(182)">Order now</button>
</div>
Enter fullscreen mode Exit fullscreen mode

The agent needs to:

  1. Find all restaurant cards in the DOM
  2. Parse text to extract names and IDs
  3. Identify the right button to click
  4. Handle JavaScript events
  5. Repeat for every page it navigates to

And this breaks the moment the website redesigns their frontend.

The fix: serve agents structured data, not HTML

Websites already serve different responses to different clients — mobile vs desktop, logged-in vs anonymous, API vs browser. They just don't know how to serve agents yet.

agentgate is a Python library that adds an agent-native layer to any FastAPI backend. When an agent sends X-Agent-Request: true, instead of HTML it gets a machine-readable manifest from /.well-known/agent-manifest.json — describing exactly what the site can do, with full JSON Schema for inputs and outputs.

The manifest

{
  "agent_api_version": "1.0",
  "name": "FoodPanda",
  "intents": [
    {
      "name": "search_restaurants",
      "endpoint": "/agent/intents/search_restaurants",
      "input_schema": {
        "type": "object",
        "properties": {
          "query": { "type": "string" },
          "location": { "type": "string" }
        },
        "required": ["query", "location"]
      },
      "output_schema": { ... }
    }
  ],
  "flows": [
    {
      "name": "order_food",
      "orchestration": "server",
      "steps": [
        { "name": "search", "entry": true, "next_steps": ["select"] },
        { "name": "select", "next_steps": ["checkout"] },
        { "name": "checkout", "next_steps": [] }
      ]
    }
  ]
}
Enter fullscreen mode Exit fullscreen mode

The agent discovers this at runtime — no prior knowledge of the site's API needed.

Two building blocks: intents and flows

Intents are single-step capabilities. One request, one typed response.

@agent_app.intent("search_restaurants", description="Search by query and city")
async def search_restaurants(query: str, location: str) -> list[Restaurant]:
    return db.search(query, location)
Enter fullscreen mode Exit fullscreen mode

Flows are multi-step workflows. This is the key insight: ordering food is not one action, it's a sequence — search, select a restaurant, add items, check out. Agentgate models this explicitly.

@agent_app.flow("order_food", orchestration="server")
class OrderFoodFlow:

    @step(entry=True)
    async def search(self, query: str, location: str) -> list[Restaurant]:
        ...

    @step(after="search")
    async def select_restaurant(self, restaurant_id: str) -> list[MenuItem]:
        ...

    @step(after="select_restaurant")
    async def checkout(self, items: list[CartItem], address: str) -> OrderConfirmation:
        ...
Enter fullscreen mode Exit fullscreen mode

The server enforces step order — an agent cannot jump to checkout before search. If it tries, it gets a clear error:

{
  "error": "step_not_allowed",
  "message": "Step 'checkout' cannot follow ''. Allowed next: ['search']",
  "allowed_next_steps": ["search"]
}
Enter fullscreen mode Exit fullscreen mode

From the agent side

client = AgentClient(agent_id="my-agent", api_key="sk-...")
site = await client.discover("https://foodpanda.com")

session = await site.flows.order_food.start()
restaurants = await session.step("search", query="pizza", location="Berlin")
menu = await session.step("select_restaurant", restaurant_id=restaurants[0]["id"])
order = await session.step("checkout", items=[...], address="Alexanderplatz 1")

print(session.is_complete)  # True
Enter fullscreen mode Exit fullscreen mode

The agent never touches HTML. It gets typed structured data at every step.

Two orchestration modes

Not every workflow needs server-side state. Agentgate supports two modes per flow:

Server-orchestrated — the server holds the session. Best for orders, payments, and multi-stage forms where the server needs to validate each transition.

Client-orchestrated — each step is a stateless independent endpoint. The agent chains data between steps itself. Best for idempotent reads like status polling.

# Client-orchestrated: no session, no state on server
@agent_app.flow("track_delivery", orchestration="client")
class TrackDeliveryFlow:
    @step(entry=True)
    async def initiate(self, order_id: str) -> TrackingInfo: ...

    @step(after="initiate")
    async def get_status(self, tracking_id: str) -> StatusUpdate: ...
Enter fullscreen mode Exit fullscreen mode
# Agent calls each step independently, passes data itself
tracking = await site.flows.track_delivery.steps.initiate(order_id="ORD-001")
status = await site.flows.track_delivery.steps.get_status(
    tracking_id=tracking["tracking_id"]
)
Enter fullscreen mode Exit fullscreen mode

How this fits with MCP

Model Context Protocol (MCP) connects an LLM to local tools and resources — file systems, databases, internal APIs. It's a client-side protocol for expanding what a model can do within a session.

Agentgate is for the other direction: public websites opting in to being structured and callable by agents on the open internet. An agent can use MCP to access its own tools (memory, calendar, code execution) while using agentgate to interact with external services (e-commerce, booking, delivery).

They address different layers of the stack and work well together.

Mounting onto an existing FastAPI app

from agentgate import AgentApp, AuthDef, step
from fastapi import FastAPI

agent_app = AgentApp(
    title="FoodPanda",
    auth=AuthDef(type="api_key", header="X-API-Key"),
)

# ... register intents and flows ...

fastapi_app = FastAPI()
agent_app.mount(fastapi_app)  # existing routes are completely untouched
Enter fullscreen mode Exit fullscreen mode

That's the entire integration. Run uvicorn as normal.

Try it

The library is open source, MIT licensed, and includes a full FoodPanda example:

pip install agentgate
uvicorn examples.foodpanda_server:fastapi_app --reload
python examples/agent_client.py
Enter fullscreen mode Exit fullscreen mode

github.com/NasirMalik/agentgate

Feedback welcome — especially on the flow orchestration model and whether this maps to workflows you're actually building.

Top comments (0)