How to Build Custom Tools for Copilot with Python and Zyte API
AI coding assistants like GitHub Copilot are incredibly powerful, but their capabilities are often sandboxed. They can write code, but they can't always interact with specific external APIs or access proprietary tools relevant to your project. What if you could give your AI assistant direct access to specialized tools, like a high-performance web scraping API?
That's exactly what this tutorial covers. We will build a simple MCP server using Python that exposes custom functions as tools for your AI assistant. To demonstrate a powerful real-world use case, we'll integrate the Zyte API to give our AI sophisticated web scraping abilities, allowing it to access and process content from virtually any website.
Why Extend Your AI's Capabilities?
By default, an AI assistant can't access live web pages if they are blocked by bot detection, nor can it interact with your company's internal staging environment or private APIs. By building a custom tool bridge, you can grant specific permissions and capabilities:
- Access Restricted Data: Allow the AI to query internal databases or APIs.
- Bypass Web Blocks: Use a robust web scraping service like Zyte API to fetch web content reliably, overcoming anti-scraping measures that would normally stop the AI.
- Perform Specialized Tasks: Create tools for complex calculations, image processing, or interacting with specific hardware.
In this guide, we'll use a Python framework to create these tools and demonstrate how to perform advanced web scraping tasks by simply chatting with Copilot.
Step 1: Setting Up Your Basic Tool Server
First, we need to create a basic server that will host our tools. We'll use a Python framework designed for this purpose (here referred to based on a library for creating MCP, or Machine Conversation Protocol, tools).
Let's start with a minimal example. Create a Python file (e.g., main.py
) and add the following code. This basic server defines a single tool called add_two_numbers
.
# main.py
from fastmcp import FastMCP
from base64 import b64decode
import requests
import os
mcp = FastMCP("Demo MCP Server")
# @mcp.tool decorator registers the function as an available tool for the AI.
@mcp.tool
def add_two_numbers(a: int, b: int) -> int:
"""Adds two integers together."""
return a + b
if __name__ == "__main__":
mcp.run()
The key element here is the @mcp.tool
decorator. This tells the server framework to expose the add_two_numbers
function so that AI assistants like Copilot can discover and execute it.
Step 2: Configuration and Verification
To make these tools discoverable by your IDE and AI assistant on a per-project basis, you need to create a configuration file.
Create a Configuration File: In your project's root directory, create a
.mcp/mcp.json
file. This JSON file tells the local environment where to find your tool server. You can often generate a template for this file using a command provided by your tool framework (e.g.,fastmcp install mcp.json main.py
).-
Configuration Example (
.mcp/mcp.json
):
{"servers": { "Demo MCP Server": { "command": "uv", "args": [ "run", "--with", "fastmcp", "fastmcp", "run", "/path/to/main.py" ], "env": {} } }}
Run and Verify:
* Start your Python server. You should see output indicating that the server is running and has discovered your tools.
* In your IDE (like VS Code), open the command palette (`Cmd+Shift+X` or `Ctrl+Shift+X`) and look for extensions managing AI tools. You should see your "Demo MCP Server" listed as active.
* You can also open the Copilot chat interface, find the "Configure Tools" option, and verify that the `add_two_numbers` tool is listed.
To test it, ask Copilot to perform the task, explicitly telling it to use your tool:
Prompt: "Using the MCP server, add 5 and 6."
Copilot will ask for permission to run the tool from your local server. After you allow it, it will execute the function on your server and return the result (11). This confirms the connection between your AI and your custom code is working.
Step 3: Supercharging Your AI with Web Scraping Capabilities
Now for the powerful part. Let's create a tool that fetches website content using the Zyte API. This allows us to scrape pages that are normally protected by anti-bot measures.
1. Define the New Tool Function
First, let's define the function signature in main.py
. We'll create an extract_html
function that takes a URL and returns the page content. It's crucial to add a descriptive docstring so the AI understands what the tool does and when to use it.
import requests
import base64
import os
# ... (previous add_two_numbers function) ...
@mcp.tool
def extract_html(url: str) -> dict:
"""
Extracts HTML content from a given URL using the Zyte API.
Returns the HTML content.
"""
# API call logic will go here
pass
2. Securely Handle API Keys
Never hardcode your API keys. We'll use environment variables to keep them secure. Add a check at the start of your script to ensure the API key is available before starting the server.
# At the top of main.py
import os
API_KEY = os.getenv("ZYTE_API_KEY")
if API_KEY is None:
raise ValueError("ZYTE_API_KEY environment variable not set.")
3. Implement the Zyte API Call
Now, we'll implement the logic inside the extract_html
function. We will make a request to the Zyte API endpoint, passing our target URL and API key. The Zyte API handles proxies, retries, and browser rendering, returning clean HTML.
# main.py - updated extract_html function
@mcp.tool
def extract_html(url: str) -> dict:
"""
Extracts HTML content from a given URL using the Zyte API.
Returns the HTML content.
"""
response = requests.post(
"https://api.zyte.com/v1/extract",
auth=(API_KEY, ""), # Use your API key here
json={
"url": url,
"httpResponseBody": True # Requesting the raw HTML body
}
)
response_data = response.json()
# Decode the Base64 encoded HTML response from Zyte API
html_content = base64.b64decode(response_data["httpResponseBody"]).decode('utf-8')
return {"html": html_content}
After adding this code, restart your server. The new extract_html
tool will now be available to your AI assistant.
Step 4: Real-World Application: Scraping and Parsing a Product Page
Let's see this in action. The real value isn't just fetching the HTML; it's enabling the AI to reason about data it couldn't access before.
Scenario: We want to extract product information from an e-commerce page that often blocks scrapers.
Workflow:
-
Fetch the HTML via the tool:
Prompt: "Extract the HTML from [product page URL] using Zyte."
Copilot will invoke your
extract_html
tool. The Zyte API will process the request, bypass any blocks, and send the full HTML back to the AI's context. -
Ask the AI to parse the data:
Prompt: "This is a product page. Give me CSS selectors for a standard product schema (name, price, description) that work on this page. Then, return a Python code block using BeautifulSoup to parse it."
Now, because the AI has access to the actual HTML via your tool, it can accurately generate working CSS selectors and parsing code. Without the custom tool, Copilot would likely respond with "I cannot access live websites" or provide generic, non-functional code.
Here's an example of what the AI might generate:
# AI-generated code based on the fetched HTML context
from bs4 import BeautifulSoup
# The HTML content would be loaded here from the previous step's output
html_doc = """... full HTML content from Zyte API ..."""
soup = BeautifulSoup(html_doc, 'html.parser')
product_schema = {
"name": soup.select_one("h1.product-title").get_text(strip=True) if soup.select_one("h1.product-title") else None,
"price": soup.select_one("span.price-amount").get_text(strip=True) if soup.select_one("span.price-amount") else None,
"description": soup.select_one("div.product-description").get_text(strip=True) if soup.select_one("div.product-description") else None,
}
print(product_schema)
Conclusion: Unleash Your AI's Potential
Creating custom tools for your AI assistant bridges the gap between general-purpose code generation and specialized, project-specific automation. As demonstrated, building a simple Python server to expose functions is straightforward. By connecting powerful external services like the Zyte API, you effectively give your AI superpowers, enabling complex workflows like reliable web scraping and data extraction directly from your development environment.
This approach allows you to build highly specialized, powerful integrations limited only by the tools you decide to create.
Full Code
from fastmcp import FastMCP
from base64 import b64decode
import requests
import os
mcp = FastMCP("Demo MCP Server")
@mcp.tool
def add(a: int, b: int) -> int:
"""Add two numbers"""
return a + b
@mcp.tool
def subtract(a: int, b: int) -> int:
"""Subtract two numbers"""
return a - b
@mcp.tool
def request_html(url: str) -> str:
"""Fetch HTML content from a URL"""
response = requests.get(url)
return response.text
@mcp.tool
def zyte_product(url: str) -> dict:
"""
Fetch product data from Zyte API and return the product JSON.
"""
api_key = os.environ.get("ZYTE_API_KEY")
if not api_key:
raise RuntimeError("ZYTE_API_KEY environment variable not set")
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=(api_key, ""),
json={
"url": url,
"httpResponseBody": True,
"product": True,
"productOptions": {"extractFrom": "httpResponseBody", "ai": True},
"followRedirect": True,
},
)
data = api_response.json()
return data["product"]
@mcp.tool
def get_html_from_zyte(url: str) -> dict:
"""
Fetches and returns only the HTML content from Zyte API for a given URL.
"""
api_key = os.environ.get("ZYTE_API_KEY")
if not api_key:
raise RuntimeError("ZYTE_API_KEY environment variable not set")
api_response = requests.post(
"https://api.zyte.com/v1/extract",
auth=(api_key, ""),
json={
"url": url,
"httpResponseBody": True,
},
)
http_response_body: bytes = b64decode(api_response.json()["httpResponseBody"])
return {"html": http_response_body.decode("utf-8")}
if __name__ == "__main__":
mcp.run()
Top comments (0)