DEV Community

guzmanojero
guzmanojero

Posted on

How to Generate PDF Files from Web Pages Using Selenium and Python

Introduction.

Printing web pages to PDF is a common task — whether for generating reports, saving invoices, or archiving pages. Selenium, combined with Python, makes this task simple and automatable. In this post, we’ll go through step by step how to generate PDF files of web pages using Selenium.

Prerequisites

  • Python 3.8+
  • Google Chrome
  • selenium package 4.0+

Step 1: Setting up Selenium with Chrome

from selenium import webdriver

driver = webdriver.Chrome()
Enter fullscreen mode Exit fullscreen mode

Step 2: Navigate to the Page You Want to Print

driver.get("https://www.selenium.dev")
Enter fullscreen mode Exit fullscreen mode

Step 3: Configure Print Options

Selenium provides a PrintOptions class to configure how the PDF should look.

from selenium.webdriver.common.print_page_options import PrintOptions

print_options = PrintOptions()
print_options.orientation = "portrait"  # or "landscape"
print_options.scale = 0.60               # Adjust scale
print_options.background = True         # Include background graphics
Enter fullscreen mode Exit fullscreen mode

Step 4: Generate the PDF

When you call driver.print_page(), Selenium returns the PDF as a base64 string. We must decode it to bytes before saving it to a file.

Remember this:
A PDF is a binary file → a sequence of raw bytes (e.g., %PDF-1.7 followed by compressed streams).

If you try to write the base64 string directly to disk, you’ll just get a .txt file full of gibberish characters, not a valid PDF.

To go from the base64 text → real PDF, you must decode the base64 string into raw bytes:

import base64

pdf_base64 = driver.print_page(print_options=print_options)
pdf_bytes = base64.b64decode(pdf_base64)
Enter fullscreen mode Exit fullscreen mode

Step 5: Save the PDF to Disk

with open("website_print_pdf", "wb") as f:
    f.write(pdf_bytes)
Enter fullscreen mode Exit fullscreen mode
  • "wb" = write binary mode.
  • PDF is a binary file, so writing in text mode ("w") will break it.

Step 6: Close the Browser

Always clean up by quitting the driver:

driver.quit()
Enter fullscreen mode Exit fullscreen mode

Complete Code

import base64
from selenium import webdriver
from selenium.webdriver.common.print_page_options import PrintOptions

# Setup Chrome
driver = webdriver.Chrome()

# Navigate to page
driver.get("https://www.selenium.dev")

# Configure print options
print_options = PrintOptions()
print_options.orientation = "portrait"
print_options.scale = 0.50
print_options.background = False

# Generate PDF
pdf_base64 = driver.print_page(print_options=print_options)
pdf_bytes = base64.b64decode(pdf_base64)

# Save PDF
with open("website_print.pdf", "wb") as f:
    f.write(pdf_bytes)

# Close browser
driver.quit()
Enter fullscreen mode Exit fullscreen mode

Add this to you code to see what happens:


# Save PDF
with open("output_str.txt", "w") as f:
    f.write(pdf_base64)

with open("output_bytes.pdf", "wb") as f:
    f.write(pdf_bytes)

with open("output_bytes_in_txt.txt", "wb") as f:
    f.write(pdf_bytes)
Enter fullscreen mode Exit fullscreen mode

Top comments (1)

Collapse
 
onlineproxy profile image
OnlineProxy

Selenium’s pretty much the go-to when it comes to turning dynamic web pages into PDFs, especially if you've got stuff like dashboards or e-commerce receipts that rely on JavaScript. It’s way better than other tools like PyPDF2 for rendering real browser content, but yeah, it’s not the fastest, and it doesn't give you tons of flexibility. It handles dynamic content like a champ, but if you’re dealing with massive or super complex pages, you might run into some slowdowns and heavy resource use. For automation, it works pretty seamlessly in scraping setups, and you can schedule it using cron jobs or cloud functions if you need to.