DEV Community

Scrape Vital YouTube Information for Competitor Analysis

Imagine having instant access to crucial YouTube insights—video performance, viewer sentiment, and trending content—all at the click of a button. Sounds like a game-changer, right?
For YouTube creators, analyzing video performance and understanding viewer interactions is key. But doing it manually? That’s a productivity killer. Scraping vital YouTube data, however, can automate the process, saving time and providing deep insights. In this guide, we’ll build a Python script that does all the heavy lifting.

Step-by-Step Guide to Building the Scraper

Let's dive right in and make this happen.
Step 1: Installing Essential Packages
Before we can start scraping, we need some tools in our toolkit. These packages will help us interact with YouTube, handle proxies, and process data. To install them, run:

pip install selenium-wire selenium blinker==1.7.0
Enter fullscreen mode Exit fullscreen mode

Now, let’s set up the libraries in our script:

from selenium.webdriver.chrome.options import Options
from seleniumwire import webdriver as wiredriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import json
import time
Enter fullscreen mode Exit fullscreen mode

These imports cover everything from browser interactions to data management. The json module ensures we format the extracted data nicely, while time helps us add randomness to prevent the script from looking too robotic.

Step 2: Initializing the Selenium Chrome Driver
Running scripts that directly interact with the web can expose your IP to risks. YouTube’s strict scraping policies make it even more critical to mask your identity. To avoid being blocked, we’ll use a proxy.
Here’s how to set it up:

proxy_address = ""
proxy_username = ""
proxy_password = ""

chrome_options = Options()
chrome_options.add_argument(f'--proxy-server={proxy_address}')
chrome_options.add_argument(f'--proxy-auth={proxy_username}:{proxy_password}')

driver = wiredriver.Chrome(options=chrome_options)
Enter fullscreen mode Exit fullscreen mode

With the proxy in place, you're all set to scrape vital YouTube without triggering alarm bells.

Step 3: Extracting Data from YouTube
Now we’re ready to extract the good stuff—video details and viewer interactions. First, let’s load the page:

youtube_url_to_scrape = ""
driver.get(youtube_url_to_scrape)
Enter fullscreen mode Exit fullscreen mode

Next, we’ll define an extract_information() function to grab key video data, such as title, description, views, likes, and comments. Here’s how we make sure everything loads before scraping:

def extract_information() -> dict:
    try:
        element = WebDriverWait(driver, 15).until(
            EC.presence_of_element_located((By.XPATH, '//[@id="expand"]'))
        )
        element.click()

        time.sleep(10)
        actions = ActionChains(driver)
        actions.send_keys(Keys.END).perform()
        time.sleep(10)
        actions.send_keys(Keys.END).perform()
        time.sleep(10)
Enter fullscreen mode Exit fullscreen mode

We use WebDriverWait to ensure all page elements are fully loaded before proceeding. After waiting, we simulate scrolling to load more content (like comments) using the ActionChains class.

Step 4: Extracting Specific Details
We’ll now extract the key data we need, including:
Video Title
Owner’s Name
Subscriber Count
Description
Publish Date
Views
Likes
Comments

video_title = driver.find_elements(By.XPATH, '//[@id="title"]/h1')[0].text
owner = driver.find_elements(By.XPATH, '//[@id="text"]/a')[0].text
total_number_of_subscribers = driver.find_elements(By.XPATH, "//div[@id='upload-info']//yt-formatted-string[@id='owner-sub-count']")[0].text
Enter fullscreen mode Exit fullscreen mode

We grab each element using find_elements() with an XPath selector. If you're not familiar with XPath, it’s a language for navigating through elements in an HTML document. Chrome’s “Inspect” tool lets you easily copy XPath for any element you want to scrape.
For the comments, we loop through the names and content, creating a dictionary for each comment:

comment_names = driver.find_elements(By.XPATH, '//[@id="author-text"]/span')
comment_content = driver.find_elements(By.XPATH, '//[@id="content-text"]/span')

comment_library = []
for each in range(len(comment_names)):
    name = comment_names[each].text
    content = comment_content[each].text
    indie_comment = {'name': name, 'comment': content}
    comment_library.append(indie_comment)
Enter fullscreen mode Exit fullscreen mode

Finally, all the gathered data is organized into a dictionary:

data = {
    'owner': owner,
    'subscribers': total_number_of_subscribers,
    'video_title': video_title,
    'description': description,
    'date': publish_date,
    'views': total_views,
    'likes': number_of_likes,
    'comments': comment_library
}
Enter fullscreen mode Exit fullscreen mode

Step 5: Saving Data to a JSON File
Once the data is extracted, it’s time to save it for later analysis. We’ll convert the dictionary to a formatted JSON file.

def organize_write_data(data: dict):
    output = json.dumps(data, indent=2, ensure_ascii=False).encode("ascii", "ignore").decode("utf-8")
    try:
        with open("output.json", 'w', encoding='utf-8') as file:
            file.write(output)
    except Exception as err:
        print(f"Error encountered: {err}")
Enter fullscreen mode Exit fullscreen mode

This function writes the data to a file called output.json—perfect for later analysis.

The Complete Script

Here’s the complete script, ready to go:

from selenium.webdriver.chrome.options import Options
from seleniumwire import webdriver as wiredriver
from selenium.webdriver.common.keys import Keys
from selenium.webdriver.common.by import By
from selenium.webdriver.support.wait import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.common.action_chains import ActionChains
import json
import time

# Proxy Setup
proxy_address = ""
proxy_username = ""
proxy_password = ""

chrome_options = Options()
chrome_options.add_argument(f'--proxy-server={proxy_address}')
chrome_options.add_argument(f'--proxy-auth={proxy_username}:{proxy_password}')
driver = wiredriver.Chrome(options=chrome_options)

# Scraping the Page
youtube_url_to_scrape = ""
driver.get(youtube_url_to_scrape)

def extract_information():
    # ... (include extraction code from above)
    return data

# Save to JSON
organize_write_data(extract_information())
driver.quit()
Enter fullscreen mode Exit fullscreen mode

Results You Can Rely On

Once the script finishes, you’ll have an organized JSON file filled with YouTube video data—ready for analysis. The structure looks clean, with all the essential data points neatly compiled.
By using a proxy and respecting YouTube's scraping policies, this method ensures that you can safely harvest valuable insights without risking restrictions.

Final Thoughts

Automating the process of data collection from YouTube with Python is a game-changer for creators and analysts alike. Whether you’re tracking video performance, measuring audience engagement, or spotting trends, a scraper is your ticket to better insights. And with the power of Selenium, proxies, and Python’s flexibility, you can collect data without worrying about the dreaded IP bans.

Do your career a big favor. Join DEV. (The website you're on right now)

It takes one minute, it's free, and is worth it for your career.

Get started

Community matters

Top comments (0)

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

👋 Kindness is contagious

Dive into an ocean of knowledge with this thought-provoking post, revered deeply within the supportive DEV Community. Developers of all levels are welcome to join and enhance our collective intelligence.

Saying a simple "thank you" can brighten someone's day. Share your gratitude in the comments below!

On DEV, sharing ideas eases our path and fortifies our community connections. Found this helpful? Sending a quick thanks to the author can be profoundly valued.

Okay