Monitoring Share of Search: Automating IKEA Product Visibility Tracking

#webscraping #python #node #dataengineering

In e-commerce, being the "best" product doesn't matter if no one can find you. For brands selling on large marketplaces like IKEA, visibility is the primary currency. If your ergonomic office chair is buried on page five of the search results, your sales will reflect that invisibility, regardless of your price or build quality.

This concept is known as Share of Search or Digital Shelf Analytics. It involves measuring how often and how prominently your products appear for specific keywords compared to your competitors. While standard web scraping focuses on what a product is (price or description), rank tracking focuses on where a product is.

This guide walks through building a Python-based IKEA scraper specifically to track search rankings. You will learn how to monitor a product's position for target keywords and calculate visibility over time.

Prerequisites & Setup

You’ll need a basic understanding of Python and JSON data. We will use a few key libraries to handle requests and data organization.

Install the necessary dependencies:

pip install requests pandas

IKEA employs sophisticated anti-bot measures and dynamic content loading. A standard requests.get() call often results in a 403 Forbidden error or an empty page. We will use the ScrapeOps Proxy API to handle proxy rotation, browser headers, and JavaScript rendering.

If you don't have one, you can get a free ScrapeOps API key here.

The Strategy: Rank vs. Data

Most developers approach scraping as a data extraction task, such as fetching the price of a specific SKU. Tracking Share of Search requires a different logic. We aren't just looking for a product; we are looking for its Rank Position.

The challenge with IKEA is that results are paginated or use infinite scroll. A product at the top of Page 2 is effectively at Rank 25 if the site displays 24 items per page. To get an accurate Rank Position, we must:

Scrape search results in the order they appear to the user.
Maintain a global counter across multiple pages.
Identify target SKUs within that ordered list.

Step 1: Configuring the Target List

Before writing the scraper, we need to define our targets. We’ll create a configuration object that maps generic category keywords to the specific Product IDs (SKUs) we want to monitor.

# config.py

# Keywords to track
TARGET_KEYWORDS = ["standing desk", "velvet sofa", "gaming chair"]

# The SKUs (Product IDs) we are tracking (ours or competitors)
MY_PRODUCTS = {
    "standing desk": ["S49440193", "S19222267"],
    "velvet sofa": ["80430535"],
    "gaming chair": ["70483818"]
}

Step 2: The Search Scraper

IKEA’s search page is dynamic. To get the same results a human sees, the request must mimic a real browser. We will route requests through ScrapeOps to handle JavaScript rendering and bypass anti-bot protections.

The following function takes a keyword and a page number, then returns the HTML content.

import requests
from urllib.parse import urlencode

SCRAPEOPS_API_KEY = 'YOUR_API_KEY'

def get_ikea_search_page(keyword, page_number=1):
    base_url = f"https://www.ikea.com/us/en/search/?q={keyword}&p={page_number}"

    payload = {
        'api_key': SCRAPEOPS_API_KEY,
        'url': base_url,
        'render_js': 'true', # IKEA requires JS to load product grids
    }

    proxy_url = 'https://proxy.scrapeops.io/v1/?' + urlencode(payload)

    try:
        response = requests.get(proxy_url)
        if response.status_code == 200:
            return response.text
        return None
    except Exception as e:
        print(f"Error fetching page {page_number}: {e}")
        return None

Step 3: Parsing Ranks and Handling Pagination

Next, we parse the HTML, find product containers, and calculate their absolute rank. IKEA typically uses data-product-number attributes in their HTML tags, which makes identification straightforward.

We use a start_rank variable to ensure positions are calculated correctly across pages. If we are on Page 2 and there are 24 items per page, the first item on that page is Rank 25.

from bs4 import BeautifulSoup

def parse_search_results(html, current_page, items_per_page=24):
    soup = BeautifulSoup(html, 'html.parser')
    products = soup.select('.plp-fragment-wrapper') 

    results = []
    start_rank = (current_page - 1) * items_per_page

    for index, item in enumerate(products):
        product_id = item.get('data-product-number') or ""
        product_name = item.select_one('.plp-price-module__product-name')

        if product_id:
            results.append({
                'product_id': product_id.strip(),
                'name': product_name.text.strip() if product_name else "Unknown",
                'rank': start_rank + index + 1
            })

    return results

Step 4: Structuring Data for Analysis

To make this data useful, we need to store it for time-series analysis. We'll combine the previous steps into a loop that checks target SKUs against the scraped ranks and saves the results to a CSV.

import pandas as pd
from datetime import datetime

all_rankings = []
timestamp = datetime.now().strftime("%Y-%m-%d %H:%M")

for keyword in TARGET_KEYWORDS:
    print(f"Tracking rank for: {keyword}")

    # Check the first 2 pages (top 48 results)
    for page in range(1, 3):
        html = get_ikea_search_page(keyword, page)
        if not html:
            continue

        page_results = parse_search_results(html, page)

        for item in page_results:
            if item['product_id'] in MY_PRODUCTS.get(keyword, []):
                all_rankings.append({
                    'date': timestamp,
                    'keyword': keyword,
                    'product_id': item['product_id'],
                    'product_name': item['name'],
                    'rank_position': item['rank']
                })

# Save to CSV
df = pd.DataFrame(all_rankings)
df.to_csv('ikea_rankings.csv', mode='a', index=False, header=not False)

Step 5: Analyzing Volatility

After collecting data for a few days, you can analyze Rank Volatility. If a product consistently appears in the top 10 for "standing desk," you have a high share of search. A sudden drop to position 30 might indicate a competitor has optimized their listing or IKEA has updated its algorithm.

Use pandas to calculate average positions and identify how often products hit the "Top 10."

data = pd.read_csv('ikea_rankings.csv')

# Calculate average rank per product
avg_ranks = data.groupby(['keyword', 'product_id'])['rank_position'].mean()
print(avg_ranks)

# Count Top 10 appearances
top_10_count = data[data['rank_position'] <= 10].shape[0]
print(f"Total Top 10 Placements: {top_10_count}")

Practical Tips

When building a rank tracker for IKEA, keep these factors in mind:

Regional Differences: Search results vary by zip code based on local stock. If you are tracking a specific market, ensure your requests use the correct localized IKEA URLs or parameters.
Sponsored Results: IKEA mixes organic results with "Sponsored" or "New" labels. Most analysts count these as they occupy physical space on the digital shelf, but you can filter them out if you only care about organic performance.
Frequency: Search rankings rarely fluctuate by the hour. Running this script once every 24 hours is usually enough to identify trends without consuming unnecessary API credits.

To wrap up, monitoring Share of Search on IKEA.com-scraping reprository turns web scraping into a competitive intelligence tool. By focusing on rank position rather than just product details, you get a clear view of your brand's actual performance on the digital shelf.