DEV Community

John Rooney for Zyte

Posted on

Your First Requests with Zyte API: 3 Game-Changing Features

Tired of wrestling with proxies, getting blocked, or writing complex parsers just to get the data you need? Let's cut through the noise. Getting web data shouldn't be a battle.

In this post, I'll show you how to make your first request with the Zyte API and walk through three powerful features that handle the hard parts of web scraping for you.

Let's get started.

The Setup: Your First Few Lines of Code

First things first, let's get our Python environment ready. I'm assuming you have Python set up and have installed the requests library (pip install requests).

Here’s the basic boilerplate to get us going. I'm importing requests to send our API call and os to securely grab my API key from an environment variable. Pro-tip: Never hardcode your API keys directly in your script! Storing them as environment variables is a much safer practice.

Zyte

You'll need your Zyte API handy, and preferrably stored in your environment. However, for testing you can paste it direct to you code (but this isn't recommended, and be careful not to share it with anyone).

import requests
import os
import base64 # We'll need this for our first example
from rich import print # Optional: for pretty-printing output

# Get API key from environment variables
API_KEY = os.getenv('ZYTE_API_KEY')
if not API_KEY:
    raise ValueError("ZYTE_API_KEY environment variable not set")

# The Zyte API endpoint for extraction
ZYTE_API_URL = 'https://api.zyte.com/v1/extract'
Enter fullscreen mode Exit fullscreen mode

With that out of the way, let's dive into the good stuff.

Feature 1: Get Raw HTML (Without the Proxy Hassle)

At its core, web scraping starts with fetching a page's HTML. But what about proxy rotation, user agents, and ban management? Forget about it. The API handles all of that for you.

To get the clean, raw HTML of a page, you simply make a POST request, tell the API you want the httpResponseBody, and you're done.

# The website we want to scrape
target_url = 'https://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html'

# Make the request to get the raw HTML
response = requests.post(
    ZYTE_API_URL,
    auth=(API_KEY, ''), # Authenticate with your API key
    json={
        'url': target_url,
        'httpResponseBody': True
    }
)

# The response body comes back Base64 encoded
encoded_html = response.json()['httpResponseBody']

# Decode it to get the usable HTML
html_bytes = base64.b64decode(encoded_html)
html = html_bytes.decode('utf-8')

# Print the first 400 characters of the HTML
print(html[:400])
Enter fullscreen mode Exit fullscreen mode

Just like that, you have the full HTML, ready to be passed into your favourite parser like Beautiful Soup or lxml. The key takeaway? One request, zero proxy management.

Feature 2: Render JavaScript with a Single Parameter

What if a site relies heavily on JavaScript to load its content? In the old days, this meant firing up a heavy automation tool like Selenium or Playwright. That's a world of brittle selectors, server configuration headaches, and maintenance nightmares.

With Zyte API, you just swap one parameter. Change httpResponseBody to browserHtml, and the API will render the page in a headless browser for you.

# Make the request to get browser-rendered HTML
response = requests.post(
    ZYTE_API_URL,
    auth=(API_KEY, ''),
    json={
        'url': target_url,
        'browserHtml': True # Just change this one line!
    }
)

# The HTML comes back as a plain string, no decoding needed
browser_html = response.json()['browserHtml']

# Print the first 400 characters
print(browser_html[:400])
Enter fullscreen mode Exit fullscreen mode

No extra dependencies, no browser binaries to install, and no instability. You get perfectly rendered HTML from dynamic sites with a single, simple API call. It’s that easy.

Feature 3: Ditch the Parsers with Automatic Extraction

Here’s where it gets really powerful. Why write parsers at all if you don't have to? If you're scraping common page types like products, articles, or job postings, you can tell the API to extract the data for you.

Using a behind-the-scenes machine learning model, the API can identify key information, structure it into a clean JSON object, and return it to you. For this example, I know my target URL is a product page.

# Let the API extract the product data for us
response = requests.post(
    ZYTE_API_URL,
    auth=(API_KEY, ''),
    json={
        'url': target_url,
        'product': True # Tell the API this is a product page
    }
)

# Get the structured data from the 'product' key in the response
product_data = response.json()['product']

# Let's see our structured data!
print(product_data)
Enter fullscreen mode Exit fullscreen mode

Check that out! With product: True, we instantly get structured data—name, price, currency, SKU, availability, and more—without writing a single line of parsing code. This is perfect for building scalable data pipelines where a consistent schema is key.


Your Turn to Build

We've just scratched the surface. With a single API endpoint, you can eliminate proxies, seamlessly render JavaScript, and even bypass manual parsing entirely. The focus shifts from how to get the data to what you're going to do with it.

Ready to give it a spin?

Start building your first requests and pull the web data you need, minus the hassle. Thanks for reading, and subscribe for more content on scraping the web the smart way!

Top comments (0)