Generating PDFs in Selenium with Python

#python #selenium #webscraping #testing

Introduction.

Printing web pages to PDF is a common task — whether for generating reports, saving invoices, or archiving pages. Selenium, combined with Python, makes this task simple and automatable. In this post, we’ll go through step by step how to generate PDF files of web pages using Selenium.

Prerequisites

Python 3.8+
Google Chrome
selenium package 4.0+

Step 1: Setting up Selenium with Chrome

from selenium import webdriver

driver = webdriver.Chrome()

Step 2: Navigate to the Page You Want to Print

driver.get("https://www.selenium.dev")

Step 3: Configure Print Options

Selenium provides a PrintOptions class to configure how the PDF should look.

from selenium.webdriver.common.print_page_options import PrintOptions

print_options = PrintOptions()
print_options.orientation = "portrait"  # or "landscape"
print_options.scale = 0.60               # Adjust scale
print_options.background = True         # Include background graphics

Step 4: Generate the PDF

When you call driver.print_page(), Selenium returns the PDF as a base64 string. We must decode it to bytes before saving it to a file.

Remember this:
A PDF is a binary file → a sequence of raw bytes (e.g., %PDF-1.7 followed by compressed streams).

If you try to write the base64 string directly to disk, you’ll just get a .txt file full of gibberish characters, not a valid PDF.

To go from the base64 text → real PDF, you must decode the base64 string into raw bytes:

import base64

pdf_base64 = driver.print_page(print_options=print_options)
pdf_bytes = base64.b64decode(pdf_base64)

Step 5: Save the PDF to Disk

with open("website_print_pdf", "wb") as f:
    f.write(pdf_bytes)

"wb" = write binary mode.
PDF is a binary file, so writing in text mode ("w") will break it.

Step 6: Close the Browser

Always clean up by quitting the driver:

driver.quit()

Complete Code

import base64
from selenium import webdriver
from selenium.webdriver.common.print_page_options import PrintOptions

# Setup Chrome
driver = webdriver.Chrome()

# Navigate to page
driver.get("https://www.selenium.dev")

# Configure print options
print_options = PrintOptions()
print_options.orientation = "portrait"
print_options.scale = 0.50
print_options.background = False

# Generate PDF
pdf_base64 = driver.print_page(print_options=print_options)
pdf_bytes = base64.b64decode(pdf_base64)

# Save PDF
with open("website_print.pdf", "wb") as f:
    f.write(pdf_bytes)

# Close browser
driver.quit()

Add this to your code and see what happens.
Comment here 💪


# Save PDF
with open("output_str.txt", "w") as f:
    f.write(pdf_base64)

with open("output_bytes.pdf", "wb") as f:
    f.write(pdf_bytes)

with open("output_bytes_in_txt.txt", "wb") as f:
    f.write(pdf_bytes)

Top comments (1)

OnlineProxy • Sep 14

Selenium’s pretty much the go-to when it comes to turning dynamic web pages into PDFs, especially if you've got stuff like dashboards or e-commerce receipts that rely on JavaScript. It’s way better than other tools like PyPDF2 for rendering real browser content, but yeah, it’s not the fastest, and it doesn't give you tons of flexibility. It handles dynamic content like a champ, but if you’re dealing with massive or super complex pages, you might run into some slowdowns and heavy resource use. For automation, it works pretty seamlessly in scraping setups, and you can schedule it using cron jobs or cloud functions if you need to.