DEV Community

Cover image for Stop Hand-Holding Your AI: How to Build a Real-World Web Scraping Agent with Claude Tools 🕷️
Paul Works
Paul Works

Posted on

Stop Hand-Holding Your AI: How to Build a Real-World Web Scraping Agent with Claude Tools 🕷️

Let’s be honest. Large Language Models (LLMs) are incredibly smart, but they suffer from one crippling weakness: they are trapped in a box.

If you ask Claude to summarize a breaking news article or read documentation for a brand-new library released yesterday, it will apologize and say it doesn't have real-time internet access.

But what if you could give Claude the ability to browse the web itself?

Enter Claude Tools (Anthropic's version of Function Calling). By giving Claude tools, you transform it from a conversational chatbot into a powerhouse autonomous agent.

Instead of showing you a boring "Get Current Time" example, we are going to build something you can actually use in production today: A Web-Scraping Assistant.


🎭 How Tool Use Actually Works

Using tools with Claude isn't magic; it's a 3-step conversation loop:

  1. You: "Here is a prompt, and here is a list of Python scripts (tools) you can ask me to run if you need help."
  2. Claude: "I need to read this URL. Please run the fetch_webpage tool for me."
  3. You: "Here is the raw text from the website!"
  4. Claude: "Here is the summary of the article..."

Claude never executes the code itself. It just outputs a JSON payload telling your server to run it.


💻 Let's Build It: Giving Claude the Internet

First, let's write a real-world Python function using requests and BeautifulSoup to scrape any webpage and extract the text.

import requests
from bs4 import BeautifulSoup

# 1. The actual Python function
def fetch_webpage(url: str):
    try:
        response = requests.get(url, timeout=10)
        response.raise_for_status()

        # Parse HTML and extract just the text
        soup = BeautifulSoup(response.text, 'html.parser')
        text = soup.get_text(separator=' ', strip=True)

        # Truncate to avoid blowing up the context window!
        return text[:8000] 
    except Exception as e:
        return f"Failed to fetch webpage: {str(e)}"
Enter fullscreen mode Exit fullscreen mode

Next, we tell Claude that this function exists by defining a JSON Schema:

# 2. Tell Claude about it
fetch_webpage_schema = {
  "name": "fetch_webpage",
  "description": "Fetches the raw text content of a given URL. Use this tool when you need to read an article, documentation, or any web page to answer the user's question.",
  "input_schema": {
    "type": "object",
    "properties": {
      "url": {
        "type": "string",
        "description": "The exact, fully qualified URL to scrape (e.g., https://example.com)"
      }
    },
    "required": ["url"]
  }
}
Enter fullscreen mode Exit fullscreen mode

Pro-Tip 💡: The description field here is incredibly important. Claude reads it to decide when and whether it should use this tool! Outline exactly when the AI should reach for it.

Wiring it up to the API

Now let's ask Claude a question about recent news that it absolutely cannot answer without scraping the web.

import anthropic

client = anthropic.Anthropic() # Relies on ANTHROPIC_API_KEY in your .env

messages = [{
    "role": "user", 
    "content": "Can you read this article and give me 3 bullet points summarizing it? https://dev.to/about"
}]

# 3. Make the API Call
response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1000,
    messages=messages,
    tools=[fetch_webpage_schema] # Pass the tool schema here!
)
Enter fullscreen mode Exit fullscreen mode

Instead of returning a standard text message, Claude's stop_reason will be "tool_use". It is asking us to scrape the website for it!

# Claude's response looks like this:
[
  ToolUseBlock(
    type='tool_use', 
    id='toolu_01...xyz', 
    name='fetch_webpage', 
    input={'url': 'https://dev.to/about'}
  )
]
Enter fullscreen mode Exit fullscreen mode

Closing the Loop

Claude is waiting for the data! It gave us the ToolUseBlock telling us exactly what tool to execute, and grabbed the url from our prompt. Let's finish the job:

# A. Save Claude's tool request to our message history
messages.append({"role": "assistant", "content": response.content})

# B. Actually execute the Python function locally
tool_request = response.content[0]
scraped_text = fetch_webpage(**tool_request.input)

# C. Send the scraped text back to Claude
messages.append({
    "role": "user",
    "content": [{
        "type": "tool_result",
        "tool_use_id": tool_request.id,
        "content": scraped_text,
        "is_error": False
    }]
})

# D. Make the final API call so Claude can read the text and answer
final_response = client.messages.create(
    model="claude-sonnet-4-5",
    max_tokens=1000,
    messages=messages,
    tools=[fetch_webpage_schema]
)

print(final_response.content[0].text)
Enter fullscreen mode Exit fullscreen mode

Output:

"Based on the article at that URL, here are 3 key takeaways about DEV:

  1. DEV is a community of software developers getting together to help one another out.
  2. The platform is built on open-source software called Forem.
  3. They focus on fostering an inclusive, decentralized, and positive environment for developers of all backgrounds."

🚀 Where to go from here?

You've just built a fully functioning web-scraping AI Agent. But you don't have to stop at reading data. You can define tools that let Claude:

  • Create calendar invites via the Google Calendar API
  • Query your PostgreSQL database to generate custom SQL reports
  • Execute Bash commands to manipulate files on your machine (just like Devin)

By bridging the gap between LLM reasoning and real-world execution, you are no longer building chatbots—you're building agents.

Take off the training wheels and give Claude a real tool today!

Top comments (0)