Johnsol

Posted on Mar 1

Web Scraping Dynamic Websites using Selenium and BeautifulSoup

#python #webscraping #tutorial #beginners

Introduction

What is Web Scraping?

Web scraping is the process of automatically collecting data from websites using code. Instead of copying and pasting data by hand, a program visits a webpage, reads its content, and extracts exactly what you need. This is super helpful for gathering a lot of information very quickly. For this project, we will be scraping an e-commerce website called Lazada.

By the end of this article, you will learn how to:

Understand how web scraping works on dynamic websites.
Use Selenium to load and control a web browser.
Extract specific product information using BeautifulSoup.
Save your cleanly structured data into a JSON file.

Why do we need Selenium and BeautifulSoup?

Many modern websites, like Lazada, load products dynamically. This means the product details don't exist in the page's basic HTML code right away, they pop up a few seconds later using JavaScript. Because of this, simple scraping tools (like the requests library) won't work.

To fix this, we use Selenium. Selenium acts like a real user by opening a browser and waiting for the products to fully load. Once the page is ready, we use a tool called BeautifulSoup to read the HTML and grab the exact product details we want.

Prerequisites

To follow along with this project, you will need:

A basic understanding of Python.
A Code Editor (like VS Code or PyCharm).

Step 1: Create a virtual environment
It is always best to keep your project files organized. Open your terminal and create one using:

python -m venv venv

Activate it with:
(Windows)

source venv\Scripts\activate

(Mac/Linux)

source venv/bin/activate

Step 2: Install the required libraries
Make sure you have Selenium and BeautifulSoup installed by typing these commands into your terminal:

pip install selenium
pip install beautifulsoup4

Alright, let’s do this noh!

Before we write our scraper, we need a plan. We are going to collect basic product details from Lazada's search results page. We want to grab the following for each product:

Product Name
Price
Number of items sold
Number of reviews
Seller location
Link to the product

All of this information is visible right on the main search cards, so we don't even need to click into each product individually.

To extract these details, we need to identify specific elements in the HTML that contain the data we want. You can use your browser’s Inspect tool to see the HTML structure.

Building the Scraper Step-by-Step

1. Getting User Input

First, our program needs to ask the user what they want to search for and how many pages they want to scrape.

search_item = input("Enter a product: ")
max_pages = int(input("Enter how many pages: "))

The script uses the product name to create the web address. For example, if you type "laptop", the URL becomes https://www.lazada.com.ph/tag/laptop. We will also use a loop to visit multiple pages automatically based on the number you type in. Each page displays 40 items of the selected product.

2. Setting Up Selenium

Next, we tell Selenium to open Google Chrome, but we want it to be sneaky!

options = Options()
options.add_argument("--headless=new")
options.add_argument("--disable-blink-features=AutomationControlled")
driver = webdriver.Chrome(options=options)

driver.get(url)

Headless mode: This tells the browser to run invisibly in the background. You won't actually see a window pop up, which makes the code run faster.
AutomationControlled: Disabling this feature helps hide the fact that a bot is controlling the browser, reducing the chance that Lazada will block us.

3. Waiting for Products to Load

This is the most important step for dynamic websites! The program pauses and waits up to 10 seconds until the product container with a class name Bm3ON appears on the screen. If we don't wait, the scraper might try to read the page before the products actually show up.

WebDriverWait(driver, 10).until(
    EC.presence_of_element_located((By.CSS_SELECTOR, "div.Bm3ON"))
)

html = driver.page_source

Once the page is fully loaded, BeautifulSoup reads this messy HTML and turns it into a neat structure that our code can easily search through to find the product containers.

4. Extracting the Data

Now we loop through every single product container on the page and pull out the details.

        name = None
        link = None

        anchor = item.select_one("a[title]")
        if anchor:
            name = anchor.get("title")
            href = anchor.get("href")
            if href:
                link = href if href.startswith("http") else "https:" + href

        price_element = item.select_one("span.ooOxS")
        price = price_element.get_text(strip=True) if price_element else None

        sold_element = item.select_one("span._1cEkb")
        sold = sold_element.get_text(strip=True) if sold_element else None

        reviews_element = item.select_one("span.qzqFw")
        reviews = reviews_element.get_text(strip=True) if reviews_element else None

        location_element = item.select_one("span.oa6ri")
        location = location_element.get_text(strip=True) if location_element else None

        products.append({
            "name": name,
            "price": price,
            "sold": sold,
            "reviews": reviews,
            "location": location,
            "link": link,
        })

We use CSS selectors to pinpoint the exact text we want.
We use if statements as a safety net. If a product is missing a review count or a location, the program just labels it as None instead of crashing.
Finally, it saves all these details into a list.

5. Saving Data to JSON and Closing Up

After gathering all the data across all the pages, we save it into a file and shut down the robot browser.

Saving: The data is exported as a JSON file.

with open(f"data/{filename}.json", "w", encoding="utf-8") as f:
json.dump(data, f, indent=4, ensure_ascii=False)

Closing: Selenium safely closes the hidden Chrome browser so it doesn't waste your computer's memory.

 driver.quit()

And that's it! Your web scraper is done.

How to Use the Scraper

The program will ask you what you want to search for and how many pages to scrape. Type your answers and hit Enter!

Enter a product: mechanical keyboard
Enter how many pages: 2
Scraping...
Saved 80 products to mechanical-keyboard.json

Sample Output

Because we saved our extracted data as a JSON file, it is highly structured and easy to read. If you open your new mechanical-keyboard.json file, you will see a neatly organized list of dictionaries that looks exactly like this:

[
    {
        "name": "RGB Mechanical Gaming Keyboard Blue Switch",
        "price": "₱1,250",
        "sold": "54 Sold",
        "reviews": "125 Reviews",
        "location": "Metro Manila",
        "link": "https://www.lazada.com.ph/products/example-link"
    },
    {
        "name": "Wireless 65% Mechanical Keyboard Hot-Swappable",
        "price": "₱2,100",
        "sold": "850 Sold",
        "reviews": "315 Reviews",
        "location": "Overseas",
        "link": "https://www.lazada.com.ph/products/example-link-2"
    }
]

Summary

Takes user input for the product name and page count.
Opens a hidden browser using Selenium.
Waits patiently for the dynamic products to appear.
Extracts specific details (price, name, sold count) using BeautifulSoup.
Saves everything neatly into a JSON file.
Closes the browser safely.

Conclusion

Congratulations! You now understand the core mechanics of scraping dynamic e-commerce websites. By combining Selenium's ability to automate a real web browser with BeautifulSoup's efficient data parsing, you can bypass the limitations of basic HTML scraping and gather valuable data from modern, JavaScript-heavy sites like Lazada.

As a final reminder, E-commerce sites frequently update their layouts and class names (like Bm3ON), so if your script ever returns empty data in the future, just inspect the webpage again and update your CSS selectors. Always scrape responsibly, avoid sending too many requests too quickly, and happy coding!

DEV Community