Introduction.
Printing web pages to PDF is a common task — whether for generating reports, saving invoices, or archiving pages. Selenium, combined with Python, makes this task simple and automatable. In this post, we’ll go through step by step how to generate PDF files of web pages using Selenium.
Prerequisites
- Python 3.8+
- Google Chrome
- selenium package 4.0+
Step 1: Setting up Selenium with Chrome
from selenium import webdriver
driver = webdriver.Chrome()
Step 2: Navigate to the Page You Want to Print
driver.get("https://www.selenium.dev")
Step 3: Configure Print Options
Selenium provides a PrintOptions
class to configure how the PDF should look.
from selenium.webdriver.common.print_page_options import PrintOptions
print_options = PrintOptions()
print_options.orientation = "portrait" # or "landscape"
print_options.scale = 0.60 # Adjust scale
print_options.background = True # Include background graphics
Step 4: Generate the PDF
When you call driver.print_page()
, Selenium returns the PDF as a base64
string. We must decode it to bytes before saving it to a file.
Remember this:
A PDF is a binary file → a sequence of raw bytes (e.g., %PDF-1.7 followed by compressed streams).
If you try to write the base64 string directly to disk, you’ll just get a .txt file full of gibberish characters, not a valid PDF.
To go from the base64
text → real PDF, you must decode the base64
string into raw bytes:
import base64
pdf_base64 = driver.print_page(print_options=print_options)
pdf_bytes = base64.b64decode(pdf_base64)
Step 5: Save the PDF to Disk
with open("website_print_pdf", "wb") as f:
f.write(pdf_bytes)
- "
wb
" = write binary mode. - PDF is a binary file, so writing in text mode ("
w
") will break it.
Step 6: Close the Browser
Always clean up by quitting the driver:
driver.quit()
Complete Code
import base64
from selenium import webdriver
from selenium.webdriver.common.print_page_options import PrintOptions
# Setup Chrome
driver = webdriver.Chrome()
# Navigate to page
driver.get("https://www.selenium.dev")
# Configure print options
print_options = PrintOptions()
print_options.orientation = "portrait"
print_options.scale = 0.50
print_options.background = False
# Generate PDF
pdf_base64 = driver.print_page(print_options=print_options)
pdf_bytes = base64.b64decode(pdf_base64)
# Save PDF
with open("website_print.pdf", "wb") as f:
f.write(pdf_bytes)
# Close browser
driver.quit()
Add this to you code to see what happens:
# Save PDF
with open("output_str.txt", "w") as f:
f.write(pdf_base64)
with open("output_bytes.pdf", "wb") as f:
f.write(pdf_bytes)
with open("output_bytes_in_txt.txt", "wb") as f:
f.write(pdf_bytes)
Top comments (1)
Selenium’s pretty much the go-to when it comes to turning dynamic web pages into PDFs, especially if you've got stuff like dashboards or e-commerce receipts that rely on JavaScript. It’s way better than other tools like PyPDF2 for rendering real browser content, but yeah, it’s not the fastest, and it doesn't give you tons of flexibility. It handles dynamic content like a champ, but if you’re dealing with massive or super complex pages, you might run into some slowdowns and heavy resource use. For automation, it works pretty seamlessly in scraping setups, and you can schedule it using cron jobs or cloud functions if you need to.