DEV Community

Cover image for SEO Performance Analysis Tool: AI-Powered SEO Insights with Complex Web Scraping
Kenan Can
Kenan Can

Posted on

12 3 5 7 7

SEO Performance Analysis Tool: AI-Powered SEO Insights with Complex Web Scraping

This is a submission for the Bright Data Web Scraping Challenge qualifying for two prompts:

  1. Scrape Data from Complex, Interactive Websites
  2. Most Creative Use of Web Data for AI Models

What I Built

Meet the SEO Performance Analysis Tool: A comprehensive SEO analytics platform that combines complex web scraping with AI-powered insights. This tool helps SEO professionals and content creators optimize their websites by:

  • Analyzing website performance using Google Lighthouse metrics
  • Identifying and analyzing top competitors
  • Providing AI-powered content optimization suggestions
  • Generating detailed SEO reports

Key Features:

  • πŸ“Š Lighthouse Performance Analysis: Mobile and desktop performance metrics, accessibility scores, and SEO ratings
  • πŸ” Competitor Analysis: Automatic competitor detection and content comparison
  • πŸ“ Content Analysis: AI-powered structural analysis and SEO recommendations
  • πŸ“ˆ Visual Reports: Interactive charts and comparative analysis
  • πŸ€– AI Integration: Google Gemini AI for intelligent content analysis

Demo

Live Demo: SEO Performance Analysis Tool

Source Code: GitHub Repository

Screenshots

  1. Main Interface: Clean and intuitive interface for URL and keyword input
    Main Interface

  2. Lighthouse Analysis: Complex web scraping in action, showing performance metrics
    Lighthouse Results

  3. Competitor Analysis: AI-powered competitor content comparison
    Competitor Analysis

  4. Content Analysis: Detailed content optimization recommendations
    Content Analysis

How I Used Bright Data

1. Complex Web Scraping with Scraping Browser

The tool leverages Bright Data's Scraping Browser to handle complex, JavaScript-heavy websites:

# lighthouse.py
def get_lighthouse(target_url: str):
    sbr_connection = ChromiumRemoteConnection(SBR_WEBDRIVER, 'goog', 'chrome')
    driver = Remote(sbr_connection, options=ChromeOptions())

    try:
        # Navigate to PageSpeed Insights
        encoded_url = f"https://pagespeed.web.dev/analysis?url={target_url}"
        driver.get(encoded_url)

        # Challenge 1: Wait for dynamic content loading
        WebDriverWait(driver, 60).until(
            EC.presence_of_element_located((By.CLASS_NAME, "lh-report"))
        )

        # Challenge 2: Handle tab switching for desktop analysis
        desktop_tab = WebDriverWait(driver, 20).until(
            EC.element_to_be_clickable((By.ID, "desktop_tab"))
        )
        actions = ActionChains(driver)
        actions.move_to_element(desktop_tab).click().perform()

        # Challenge 3: Verify report content changed
        WebDriverWait(driver, 20).until(
            lambda driver: driver.find_element(By.CLASS_NAME, "lh-report").text != report_text
        )
Enter fullscreen mode Exit fullscreen mode

Challenges Overcome:

  • Handling dynamic JavaScript content on PageSpeed Insights
  • Managing complex user interactions (tab switching between mobile/desktop)
  • Extracting structured data from interactive reports

2. Web Unlocker for Competitor Analysis

Used Bright Data's Web Unlocker to access competitor content reliably:

# compare_pages.py - Competitor Content Access
def fetch_html_content(url: str) -> tuple:
    try:
        # Ensure the URL has a proper scheme
        if not url.startswith(('http://', 'https://')):
            url = 'https://' + url

        # Brightdata API configuration
        api_url = "https://api.brightdata.com/request"
        headers = {
            "Content-Type": "application/json",
            "Authorization": f"Bearer {get_api_key('BRIGHTDATA_API_KEY')}"
        }
        payload = {
            "zone": "web_unlocker1",
            "url": url,
            "format": "raw"
        }

        # Make request to Brightdata API
        response = requests.post(api_url, json=payload, headers=headers)

        if response.status_code == 200:
            html_content = response.text
            soup = BeautifulSoup(html_content, 'html.parser')
            tags = soup.find_all(['h1', 'h2', 'h3', 'p'])
            collected_html = ''.join(str(tag) for tag in tags)
            return url, collected_html
    except Exception as e:
        print(f"Error fetching HTML content from {url}: {e}")
        return url, None
Enter fullscreen mode Exit fullscreen mode

3. SERP API for Competitor Discovery

Integrated Bright Data's SERP API to identify top competitors:

# compare_pages.py - Competitor Discovery
def get_top_competitor(keyword: str, our_domain: str) -> str:
    try:
        url = "https://api.brightdata.com/request"

        # Challenge: Get real-time SERP results and find relevant competitor
        encoded_keyword = requests.utils.quote(keyword)

        payload = {
            "zone": "serp_api1",
            "url": f"https://www.google.com/search?q={encoded_keyword}",
            "format": "raw"
        }

        headers = {
            "Authorization": f"Bearer {get_api_key('BRIGHTDATA_API_KEY')}",
            "Content-Type": "application/json"
        }

        response = requests.post(url, json=payload, headers=headers)

        if response.status_code == 200:
            # Parse search results with BeautifulSoup
            soup = BeautifulSoup(response.text, 'html.parser')
            all_data = soup.find_all("div", {"class": "g"})

            # Find first relevant competitor
            for result in all_data:
                link = result.find('a').get('href')
                if (link and 
                    link.find('https') != -1 and 
                    link.find('http') == 0 and 
                    our_domain not in link):
                    return link

    except Exception as e:
        st.error(f"Error finding competitor: {str(e)}")
        return None
Enter fullscreen mode Exit fullscreen mode

AI Integration Pipeline

  1. Data Collection: Use Bright Data services to gather:

    • Performance metrics (Lighthouse)
    • Competitor content
    • SERP data
  2. Data Processing: Structure collected data for AI analysis

  3. AI Analysis: Use Google Gemini AI to:

    • Compare content quality
    • Generate SEO recommendations
    • Analyze content structure
  4. Visualization: Present insights through Streamlit interface

Tech Stack

  • Frontend: Streamlit
  • Backend: Python
  • Scraping: Bright Data (Scraping Browser, Web Unlocker, SERP API)
  • AI: Google Gemini AI
  • Data Visualization: Plotly

Additional Prompt Qualifications

This project qualifies for two prompts:

  1. Scrape Data from Complex, Interactive Websites: The tool successfully handles JavaScript-heavy pages like PageSpeed Insights, managing dynamic content loading and complex user interactions through Bright Data's Scraping Browser.

  2. Most Creative Use of Web Data for AI Models: The project creates an innovative AI pipeline by combining web-scraped data (performance metrics, competitor content, SERP results) with Google Gemini AI to generate intelligent SEO insights and recommendations.

Team Submission

This submission was created by Kenan Can

Thank you for reviewing my submission! Let's make SEO analysis smarter with the power of web scraping and AI.

API Trace View

How I Cut 22.3 Seconds Off an API Call with Sentry πŸ‘€

Struggling with slow API calls? Dan Mindru walks through how he used Sentry's new Trace View feature to shave off 22.3 seconds from an API call.

Get a practical walkthrough of how to identify bottlenecks, split tasks into multiple parallel tasks, identify slow AI model calls, and more.

Read more β†’

Top comments (10)

Collapse
 
hilal_kara_dfb830b55a5e45 profile image
Hilal Kara β€’

This project offers an excellent solution to the problem it addresses. Congratulations

Collapse
 
kenancan profile image
Kenan Can β€’

Thank you for your feedback! πŸ™

Collapse
 
canmahmutn profile image
Can Uçanefe ‒

That's the spirit, that's what I'm looking for a very long time... Thanks for that solution which you made for all of us

Collapse
 
kenancan profile image
Kenan Can β€’

Thank you for your kind words! Glad it's helpful! πŸ™

Collapse
 
anl_egr_5c83da0fb58092465 profile image
Anl Egr β€’

It's a great content. It gives very good tips on what to pay attention to in complex data extraction processes.

Collapse
 
kenancan profile image
Kenan Can β€’

Thank you! Glad the insights about data extraction were helpful! πŸ™Œ

Collapse
 
melikesultancan profile image
Melike Sultan Can β€’

Really enjoyed this! The combination of AI and web scraping for SEO offers great insights.

Collapse
 
kenancan profile image
Kenan Can β€’

Thank you! Glad you found it useful! πŸ™Œ

Collapse
 
terraflop profile image
Terraflop β€’

How would you integrate Bright Data's proxy service to target specific countries for gathering localized search engine results?

Collapse
 
kenancan profile image
Kenan Can β€’

For country-specific targeting with Bright Data proxy, you can use the country parameter in your configuration:

payload = {
            "zone": "serp_api1",
            "country": "us",  # target country code
            "url": f"https://www.google.com/search?q={encoded_keyword}",
            "format": "raw"
        }
Enter fullscreen mode Exit fullscreen mode

A Workflow Copilot. Tailored to You.

Pieces.app image

Our desktop app, with its intelligent copilot, streamlines coding by generating snippets, extracting code from screenshots, and accelerating problem-solving.

Read the docs

πŸ‘‹ Kindness is contagious

Discover a treasure trove of wisdom within this insightful piece, highly respected in the nurturing DEV Community enviroment. Developers, whether novice or expert, are encouraged to participate and add to our shared knowledge basin.

A simple "thank you" can illuminate someone's day. Express your appreciation in the comments section!

On DEV, sharing ideas smoothens our journey and strengthens our community ties. Learn something useful? Offering a quick thanks to the author is deeply appreciated.

Okay