Why Web Agents Exist
LLMs have a knowledge cutoff. Ask one "what's the latest version of LangGraph?" and it can only tell you what was in its training data. Web Agents solve this: the agent actually browses the internet and returns real-time information.
But "browsing the internet" is more complicated than it sounds:
- Web pages are HTML, not text — dumping raw HTML into context floods it with useless tags
- A single page can be tens of thousands of tokens — well beyond useful context density
- Agents can loop forever — page A links to B, B links to C, never stopping
- URLs can be hallucinated — LLMs will invent plausible-sounding links that don't exist
Four problems, four engineering designs: HTML cleaning, Token Budget, Step Limit, URL error handling. This article assembles them into a working Web Agent.
Architecture
The overall structure is a standard LangGraph two-node graph:
User question
│
▼
┌─────────────────────────────────────┐
│ agent_node │
│ SystemPrompt + messages → LLM │
│ bound_llm.invoke(msgs) │
└────────┬────────────────────────────┘
│
Has tool_calls?
│
┌────┴─────┐
Yes No (or steps >= MAX_STEPS)
│ │
▼ ▼
tools_node END
web_search /
fetch_page
│
└──→ agent_node (loop)
State has only two fields:
class WState(TypedDict):
messages: Annotated[list, add_messages] # accumulated messages
steps: int # steps consumed
steps is Web Agent-specific — standard agents don't need an explicit step counter, but Web Agents can jump between pages indefinitely. A hard limit is mandatory.
Two Tools
web_search: DuckDuckGo Search
@tool
def web_search(query: str) -> str:
"""
Search the web with DuckDuckGo.
Returns up to 5 results, each with title, snippet, and URL.
Use the URLs from results to call fetch_page — never invent URLs.
"""
try:
resp = requests.get(
"https://html.duckduckgo.com/html/",
params={"q": query},
headers=HEADERS,
timeout=12,
)
soup = BeautifulSoup(resp.text, "html.parser")
results = []
for i, block in enumerate(soup.select(".result"), 1):
if i > 5:
break
title = (block.select_one(".result__title") or soup.new_tag("x")).get_text(strip=True)
snippet = (block.select_one(".result__snippet") or soup.new_tag("x")).get_text(strip=True)
url_raw = (block.select_one(".result__url") or soup.new_tag("x")).get_text(strip=True)
url = f"https://{url_raw}" if url_raw and not url_raw.startswith("http") else url_raw
results.append(f"{i}. {title}\n {snippet}\n URL: {url}")
return "\n\n".join(results) if results else "No results found."
except Exception as exc:
return f"Search error: {exc}"
Uses DuckDuckGo's HTML interface — no API key required. Parses .result CSS classes to extract title, snippet, and URL, returning structured text to the LLM.
There's a critical instruction in the tool description: Use the URLs from results to call fetch_page — never invent URLs. This is the first line of defense against URL hallucination — instructing the model at the Prompt layer where valid URLs come from.
fetch_page: Page Fetching + Cleaning
@tool
def fetch_page(url: str) -> str:
"""
Fetch a web page and return its cleaned text (truncated to token budget).
Only call with real URLs obtained from web_search results.
"""
try:
resp = requests.get(url, headers=HEADERS, timeout=12)
resp.raise_for_status()
full_text = clean_html(resp.text)
orig_tokens = count_tokens(full_text)
displayed = truncate_to_budget(full_text)
shown_tokens = min(orig_tokens, PAGE_TOKEN_BUDGET)
return (
f"[URL: {url}]\n"
f"[Size: {orig_tokens} tokens → showing {shown_tokens} tokens "
f"(budget={PAGE_TOKEN_BUDGET})]\n\n"
f"{displayed}"
)
except requests.HTTPError as exc:
return f"HTTP {exc.response.status_code} — could not fetch {url}"
except requests.ConnectionError:
return f"Connection error — {url} may not exist or be unreachable"
except Exception as exc:
return f"Error fetching {url}: {type(exc).__name__}: {exc}"
Three steps:
-
clean_html: BeautifulSoup removes script/style/nav/footer, returns plain text -
truncate_to_budget: truncates anything beyond the Token Budget - Error classification: HTTP errors, connection errors, and other exceptions each return different safe strings
Note that requests.HTTPError and requests.ConnectionError represent two distinct failure scenarios: the former means the server responded (4xx/5xx), the latter means the connection itself failed (domain doesn't exist, network unreachable).
Three Engineering Guards
Guard 1: URL Error Handling
Testing a completely nonexistent domain:
fetch_page(https://totally-made-up-domain-xyz99999.org/docs/n...)
→ Connection error — https://totally-made-up-domain-xyz99999.org/docs/nonexistent may not exist or be unreachable
No crash, no exception propagation — a safe error string is returned. The LLM receives this string and can choose to try a different URL or a different search query.
This is a key guard design principle: errors are return values, not exceptions. Tool call failures shouldn't interrupt the entire Agent execution; instead, let the LLM adapt based on the error information.
Guard 2: Token Budget Truncation
Testing the langgraph page on PyPI:
fetch_page(pypi.org/project/langgraph/)
→ [Size: 4576 tokens → showing 800 tokens (budget=800)]
Original page: 4,576 tokens. After truncation: 800 tokens. That's an 82% reduction in context usage.
The truncation implementation is simple:
PAGE_TOKEN_BUDGET = 800 # max tokens of page text sent to LLM per fetch
def count_tokens(text: str) -> int:
"""Rough estimate: ~3 chars per token for English/Chinese mix."""
return max(1, len(text) // 3)
def truncate_to_budget(text: str, budget: int = PAGE_TOKEN_BUDGET) -> str:
if count_tokens(text) <= budget:
return text
cutoff = budget * 3
return text[:cutoff] + f"\n\n[... content truncated to ~{budget}-token budget ...]"
count_tokens uses a rough estimate (3 chars ≈ 1 token), not a precise tokenizer. For truncation purposes, speed matters more than precision.
Guard 3: Step Limit
MAX_STEPS = 8
def router(state: WState) -> str:
if state["steps"] >= MAX_STEPS:
return END
last = state["messages"][-1]
if isinstance(last, AIMessage) and last.tool_calls:
return "tools"
return END
state["steps"] is incremented in every agent_node execution:
def agent_node(state: WState) -> dict:
msgs = [SystemMessage(content=SYSTEM_PROMPT)] + state["messages"]
response = bound_llm.invoke(msgs)
return {"messages": [response], "steps": state["steps"] + 1}
The router checks step count before checking tool_calls. Even if the LLM wants to keep calling tools, when the step limit is reached, execution terminates. This is a hard boundary against infinite loops.
Step count is initialized at invocation time:
state = graph.invoke(
{"messages": [HumanMessage(content=query)], "steps": 0},
config={"recursion_limit": MAX_STEPS * 3},
)
recursion_limit is LangGraph's built-in protection; steps is the application-level custom protection. Both work independently.
Run Results
======================================================================
Web Agent Demo
Model: glm-4-flash | Token budget/page: 800 | Max steps: 8
======================================================================
=== Part 3: Engineering Guards ===
──────────────────────────────────────────────────────────────────────
[Guard 1] URL error handling (bad / hallucinated URL)
fetch_page(https://totally-made-up-domain-xyz99999.org/docs/n...)
→ Connection error — https://totally-made-up-domain-xyz99999.org/docs/nonexistent may not exist or be unreachable
──────────────────────────────────────────────────────────────────────
[Guard 2] Token budget enforcement (budget=800 tokens/page)
fetch_page(pypi.org/project/langgraph/)
→ [Size: 4576 tokens → showing 800 tokens (budget=800)]
──────────────────────────────────────────────────────────────────────
[Guard 3] Step limit (MAX_STEPS=8) — agent cannot loop forever
Graph router returns END when state['steps'] >= 8
Even if tool_calls remain, execution stops.
All three guards worked as expected.
The research sections (Parts 1 & 2) hit DuckDuckGo rate limiting — searches returned empty results, and the model correctly reported failure instead of hallucinating answers. This is itself a sign the guards are effective: the agent didn't loop on empty results, it reported the failure clearly to the user.
DuckDuckGo's Limitations
The DuckDuckGo HTML interface requires no API key, but it's unreliable for production:
- Frequent requests get rate-limited or return empty results
- HTML structure can change anytime, breaking CSS selectors
- No rate limiting control, easy to trigger blocks
Production alternatives:
| Option | Characteristics |
|---|---|
| Tavily API | Designed for LLM agents, returns structured results |
| SerpAPI | Multi-engine, stable, paid |
| Brave Search API | Generous free tier, independent index |
| Jina Reader | Specialized in page-to-text conversion, high quality |
Switching only requires replacing the web_search tool implementation — the agent graph structure stays the same.
Complete Graph Code
TOOLS = [web_search, fetch_page]
TOOL_MAP = {t.name: t for t in TOOLS}
bound_llm = llm.bind_tools(TOOLS)
SYSTEM_PROMPT = f"""You are a web research agent. Answer the user's question by browsing the web.
Workflow:
1. Call web_search to find relevant pages.
2. Call fetch_page on promising URLs to read content.
3. If you find the answer, give a clear, concise final response.
4. If a page doesn't help, try a different search query.
Strict rules:
- Only use URLs from web_search results — never invent or guess URLs.
- If fetch_page returns an error, try a different URL or search query.
- You have at most {MAX_STEPS} total steps. Be efficient.
- Once you have enough information, stop browsing and answer directly."""
class WState(TypedDict):
messages: Annotated[list, add_messages]
steps: int
def agent_node(state: WState) -> dict:
msgs = [SystemMessage(content=SYSTEM_PROMPT)] + state["messages"]
response = bound_llm.invoke(msgs)
return {"messages": [response], "steps": state["steps"] + 1}
def tools_node(state: WState) -> dict:
last = state["messages"][-1]
results = []
for tc in last.tool_calls:
output = TOOL_MAP[tc["name"]].invoke(tc["args"])
results.append(ToolMessage(content=str(output), tool_call_id=tc["id"]))
return {"messages": results}
def router(state: WState) -> str:
if state["steps"] >= MAX_STEPS:
return END
last = state["messages"][-1]
if isinstance(last, AIMessage) and last.tool_calls:
return "tools"
return END
def build_graph():
g = StateGraph(WState)
g.add_node("agent", agent_node)
g.add_node("tools", tools_node)
g.set_entry_point("agent")
g.add_conditional_edges("agent", router, {"tools": "tools", END: END})
g.add_edge("tools", "agent")
return g.compile()
The compiled graph is assigned to a module-level graph variable. run_research calls graph.invoke() directly.
Design Checklist
Tool design
- [ ] HTML cleaning: remove script/style/nav/footer, keep only body text
- [ ] Error classification: HTTP error / connection error / other — each returns a safe string
- [ ] Tool description includes URL source rule:
never invent URLs
Engineering guards
- [ ] Token Budget: truncate page text to a reasonable limit (800–2000 tokens)
- [ ] Step Limit: router checks step count before checking tool_calls
- [ ] Two-layer protection: application-level
steps+ LangGraphrecursion_limit
State design
- [ ]
messages: Annotated[list, add_messages]— must use reducer, otherwise messages don't accumulate - [ ]
steps: int— Web Agent-specific field; standard agents can omit this
Production hardening
- [ ] Replace search tool with a reliable API-key-based solution (Tavily/SerpAPI)
- [ ] Set User-Agent to a real browser UA to avoid being rejected
- [ ] Request timeouts:
timeout=12for both search and page fetching
Summary
Three conclusions:
- Guards are independent of content quality: Tool failure doesn't equal Agent failure. Errors as return values let the LLM adapt — execution continues rather than crashing
- Token Budget is non-negotiable: A typical web page is 4,576 tokens; truncating to 800 saves 82% of context. At scale, browsing many pages without this would exhaust context in a few steps
-
Step Limit is a hard boundary:
steps >= MAX_STEPS → ENDlives in the router, not in the Prompt. No matter how much the LLM wants to continue, the counter stops it. Don't trust "self-discipline" for safety-critical behavior
A Web Agent's essence: give the LLM controlled eyes on the internet, not unlimited network access.
References
- LangGraph StateGraph documentation
Full demo code: agent-22-web-agent
Check out PrimeSkills — a curated marketplace of AI agents and skills that have been validated in real-world, enterprise-grade workflows. No fluff, just what actually works.
Find more useful knowledge and interesting products on my Homepage
Top comments (0)