Jerry A. Henley

Posted on Feb 17

From Script to Spreadsheet: Building a Self-Serve Etsy Competitor Tracker

#webscraping #devops #python #dataengineering

Data is only valuable if people can actually use it. For many developers, a web scraping project ends once they have a clean JSON file or a database entry. However, for marketing and growth teams, these formats often create a bottleneck. These teams live in spreadsheets. Every time they need updated pricing or product details, they have to ask an engineer for a fresh export.

This creates a cycle of manual exports and imports that wastes everyone's time. This guide breaks that cycle. We will build a self-serve Etsy competitor tracker that bridges the gap between Python and Google Sheets.

By the end of this tutorial, you’ll have a pipeline where a non-technical user can paste Etsy URLs into a Google Sheet, run a script, and watch as the spreadsheet populates with product titles, prices, and shop details using Python, Playwright, and the Google Sheets API.

Prerequisites

To follow along, you'll need:

Python 3.8+ installed on your machine.
A Google Cloud Account (the free tier works fine).
Basic familiarity with Python functions and CSS selectors.
The following libraries: playwright, gspread, and oauth2client.

Phase 1: Setup Google Cloud and API Credentials

Before writing any Python, we need to give the script permission to communicate with Google Sheets. This is the most common point of failure, so follow these steps carefully.

1. Create a Google Cloud Project

Go to the Google Cloud Console and create a new project named "Etsy-Tracker."

2. Enable APIs

Search for and enable these two APIs:

Google Sheets API
Google Drive API

3. Create a Service Account

Navigate to APIs & Services > Credentials. Click Create Credentials and select Service Account. Name it "sheets-scraper" and click Create and Continue. You can skip the optional role assignments.

4. Generate the JSON Key

Once the account is created, click on it, go to the Keys tab, and select Add Key > Create New Key. Choose JSON. This downloads a file to your computer. Rename this file to credentials.json and move it to your project folder.

5. Share the Sheet

Open the Google Sheet you want to use. Open your credentials.json file and find the client_email field. It will look like sheets-scraper@your-project.iam.gserviceaccount.com. Share your Google Sheet with this email address, giving it Editor access. Without this step, your script will return a "404 Sheet Not Found" error.

Phase 2: Building the Etsy Scraper with Playwright

Playwright is ideal for this project because Etsy uses various anti-bot measures and dynamic content. Since Playwright runs a real browser, the script is harder to distinguish from a real shopper.

First, install the dependencies and the browser binaries:

pip install playwright gspread oauth2client
playwright install chromium

Create a file named scraper.py. This function takes a URL and returns a dictionary of clean data.

from playwright.sync_api import sync_playwright
import re

def scrape_etsy_product(url):
    with sync_playwright() as p:
        browser = p.chromium.launch(headless=True)
        context = browser.new_context(
            user_agent="Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/119.0.0.0 Safari/537.36"
        )
        page = context.new_page()

        try:
            page.goto(url, wait_until="domcontentloaded", timeout=60000)

            # Etsy uses specific utility classes for product data
            title = page.locator("h1.wt-text-body-01").inner_text(timeout=5000).strip()

            # Clean the price string (e.g., "$25.00+" becomes 25.00)
            raw_price = page.locator(".wt-p-xs-1 .wt-text-title-03").first.inner_text().strip()
            price = re.sub(r'[^\d.]', '', raw_price) 

            shop_name = page.locator("a.wt-text-link-no-underline span").first.inner_text().strip()

            return {
                "Title": title,
                "Price": float(price) if price else 0.0,
                "Shop": shop_name,
                "Status": "Success"
            }
        except Exception as e:
            return {"Title": "N/A", "Price": 0, "Shop": "N/A", "Status": f"Error: {str(e)}"}
        finally:
            browser.close()

Etsy’s DOM structure is generally stable, but they use utility classes like wt-text-body-01. Using .strip() and regex ensures the price becomes a clean float, making it easier to analyze in a spreadsheet.

Phase 3: Connecting Python to Google Sheets

We’ll use gspread to handle the spreadsheet communication. The script reads URLs from the first column and writes results starting from the second column.

Create a file named sheets_config.py:

import gspread
from oauth2client.service_account import ServiceAccountCredentials

def get_sheet(sheet_name):
    scope = ["https://spreadsheets.google.com/feeds", "https://www.googleapis.com/auth/drive"]
    creds = ServiceAccountCredentials.from_json_keyfile_name("credentials.json", scope)
    client = gspread.authorize(creds)

    return client.open(sheet_name).sheet1

This setup uses your credentials.json file. The scope defines the permissions, allowing the script to read and write files in your Google Drive.

Phase 4: The Integration Logic

The real power of this tool comes from the integration logic. Instead of just dumping data, the script:

Checks the spreadsheet for URLs.
Skips URLs that have already been scraped.
Scrapes only the missing data.
Updates the sheet using a batch update.

Batch updates are essential. If you update the sheet cell-by-cell in a loop, you’ll hit Google's API rate limits, which are usually 60 requests per minute.

Here is the complete main.py:

from scraper import scrape_etsy_product
from sheets_config import get_sheet
import time

SHEET_NAME = "Etsy Competitor Tracker"

def main():
    sheet = get_sheet(SHEET_NAME)

    # Column A: URLs, B: Title, C: Price, D: Shop, E: Status
    all_rows = sheet.get_all_values()

    if not all_rows:
        print("Sheet is empty.")
        return

    data_rows = all_rows[1:] # Skip header
    updates = []

    for index, row in enumerate(data_rows, start=2): # Sheets is 1-indexed
        url = row[0]

        # Only scrape if the URL exists and Title (Col B) is empty
        if url and (len(row) < 2 or not row[1]):
            print(f"Scraping: {url}")
            result = scrape_etsy_product(url)

            row_update = [result['Title'], result['Price'], result['Shop'], result['Status']]
            range_label = f"B{index}:E{index}"
            updates.append({'range': range_label, 'values': [row_update]})

            # Brief pause to respect Etsy's rate limits
            time.sleep(2)

    if updates:
        print(f"Updating {len(updates)} rows in Google Sheets...")
        sheet.batch_update(updates)
        print("Done!")
    else:
        print("No new URLs to scrape.")

if __name__ == "__main__":
    main()

How it works

Optimization: By checking if the "Title" column is filled, the script avoids re-scraping the same data. Users can add new URLs to the bottom of the list and only the new entries will be processed.
Error Handling: If a product is sold out or a URL is dead, the script catches the error and writes it to the "Status" column, providing immediate feedback to the user.

Phase 5: Making it Accessible

To make this a true self-serve tool, the team needs an easy way to run it.

Local Execution

Create a simple batch file (Windows) or shell script (Mac/Linux) on the desktop. A user can double-click it to run the scraper and update the sheet.

Windows (run_tracker.bat):

@echo off
python main.py
pause

GitHub Actions

For a more professional setup, host the code on GitHub and use GitHub Actions. You can set the script to run on a schedule (like every morning at 8 AM) or via a manual trigger. This way, users don't need Python installed on their computers at all.

To Wrap Up

This tool transforms a simple scraping script into a functional internal product. You've empowered your team to gather their own data without needing to understand HTML selectors or API authentication.

Key Takeaways:

Service Accounts bridge Python and Google Cloud; always share the sheet with the service account email.
Playwright provides the reliability needed for high-traffic e-commerce sites.
Batch Updates are required for performance and to stay within API limits.
Deduplication logic prevents redundant work and saves resources.

Try adding historical price tracking by creating a new tab for each day, or use smtplib to send email alerts when a competitor's price drops.

To speed up development and generate reliable scraping logic with AI, you can also use the AI Web Scraping Assistant.

DEV Community