DEV Community

Valentina Skakun for HasData

Posted on

Scraping ZoomInfo with One Universal Script Using SeleniumBase

Learn how to scrape ZoomInfo data from search results, person profiles, and company pages using a single SeleniumBase script. Follow clear steps with explanations and see what data you can get without login.

Table of Contents

  1. Before You Start
  2. Prerequisites and Setup
  3. Available Data Points
  4. Universal ZoomInfo Scraping Script
  5. Anti-Scraping Measures
  6. Notes

Before You Start

ZoomInfo can be scraped in a simple and stable way. The site stores key data inside application/json blocks.

You can extract search results, profiles, and company details without complex CSS selectors.

Prerequisites and Setup

ZoomInfo shows a press-and-hold captcha if it suspects automated actions.

Simple request libraries or headless Selenium/Playwright won’t work. You need tools like SeleniumBase, Playwright Stealth, or Patchright that patch fingerprints, headers, and hide headless mode.

We'll use SeleniumBase in UC mode (built on Undetectable Browser). Install it with:

pip install seleniumbase
Enter fullscreen mode Exit fullscreen mode

Available Data Points

Here’s what you can get from each page type:

Page Type Key Data Points Notes
Search Pages name, title, profile link, company link First 5 pages only. Emails/phone/images usually missing. Some fields empty or null
Person Profiles full name, title, photo, bio, masked email/phone, work address, social links, work & education history, employer info, colleagues, similar profiles, web mentions, AI signals Most data complete. Contact info partially hidden
Company Pages legal name, size, employees, tech stack, finances, competitors, executives, address, social links, news, acquisitions, org charts, email patterns, hiring trends, intent signals, awards, comparables Some contact info, historic financials, email samples, and intent data may be partial or masked

This gives a developer-focused overview in a compact format. You can quickly decide which data points you need for your project.

Universal Scraping Script

This one script works for search pages, person profiles, and company pages. The steps:

  1. Set the page URL.
  2. Launch the browser in undetectable mode.
  3. Loop if multiple pages (search results only).
  4. Extract JSON from <script type="application/json">.
  5. Remove unnecessary keys (__nghData__, cta_config).
  6. Save the data in JSON.
from seleniumbase import SB
from selenium.webdriver.common.by import By
import time, json

# Base URL for the page (search, person, or company)
base_url = "https://www.zoominfo.com/people-search/<filters>"  # or person/company URL
pages = 5  # for search pages; set 1 for single profile/company
all_data = []

with SB(uc=True, test=True) as sb:
    for page in range(1, pages + 1):
        url = f"{base_url}?pageNum={page}" if pages > 1 else base_url
        sb.uc_open_with_reconnect(url, 4)
        time.sleep(1)  # Wait for JSON scripts to render
        try:
            scripts = sb.find_elements('script[type="application/json"]', by=By.CSS_SELECTOR)
            for el in scripts:
                content = el.get_attribute("innerHTML")
                data = json.loads(content)
                data.pop("__nghData__", None)
                data.pop("cta_config", None)
                all_data.append(data)
        except Exception as e:
            print(f"Page {page} error:", e)

# Save the data
with open("zoominfo_data.json", "w", encoding="utf-8") as f:
    json.dump(all_data, f, ensure_ascii=False, indent=2)
Enter fullscreen mode Exit fullscreen mode

This JSON contains almost all available data:

  • Search pages: names, titles, profile links, company links. Emails, phones, and images mostly hidden.
  • Person pages: full personal info, masked contacts, work/education, colleagues, AI signals.
  • Company pages: legal name, employees, tech stack, finances, executives, news, acquisitions, hiring trends, awards, etc. Some fields partially masked.

Anti-Scraping Measures

ZoomInfo has strong anti-bot protections. Check status codes before parsing data:

Status Code Meaning Recovery
200 Success Continue
429 Rate limited Wait 30-60s, retry
403 Forbidden (IP blocked) Switch IP/proxy, retry next day
503 Service unavailable Retry after 5 min
Empty 200 Honeypot Switch IP
Redirect /error Detected scraper Add delay, rotate proxy

Avoid errors, captchas, and bans by masking bots, rotating proxies, adding pauses, and handling errors with longer delays.

Notes

This post covers the universal ZoomInfo scraping script and key data points.

For a complete walkthrough with step-by-step explanations, visuals, and tips, check out the full article on our blog: Read the Full Article

All examples are in Github Repo.

Top comments (0)