<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0" xmlns:atom="http://www.w3.org/2005/Atom" xmlns:dc="http://purl.org/dc/elements/1.1/">
  <channel>
    <title>DEV Community: Sacha Dumay</title>
    <description>The latest articles on DEV Community by Sacha Dumay (@dumay_sacha).</description>
    <link>https://dev.to/dumay_sacha</link>
    <image>
      <url>https://media2.dev.to/dynamic/image/width=90,height=90,fit=cover,gravity=auto,format=auto/https:%2F%2Fdev-to-uploads.s3.amazonaws.com%2Fuploads%2Fuser%2Fprofile_image%2F2271828%2Feb76d4e6-fc1e-46ac-af25-e8ea4a37cae5.jpg</url>
      <title>DEV Community: Sacha Dumay</title>
      <link>https://dev.to/dumay_sacha</link>
    </image>
    <atom:link rel="self" type="application/rss+xml" href="https://dev.to/feed/dumay_sacha"/>
    <language>en</language>
    <item>
      <title>How to Get Your Site on AI Search (ChatGPT, Bing, Perplexity) with IndexNow</title>
      <dc:creator>Sacha Dumay</dc:creator>
      <pubDate>Sat, 31 May 2025 03:56:30 +0000</pubDate>
      <link>https://dev.to/dumay_sacha/how-to-get-your-site-on-ai-search-chatgpt-bing-perplexity-with-indexnow-3bm1</link>
      <guid>https://dev.to/dumay_sacha/how-to-get-your-site-on-ai-search-chatgpt-bing-perplexity-with-indexnow-3bm1</guid>
      <description>&lt;h2&gt;
  
  
  What I learned after 15 days of SEO: How I got my site indexed by ChatGPT, Bing, and other AI search tools — and how you can too
&lt;/h2&gt;

&lt;h2&gt;
  
  
  Get found by AI with IndexNow
&lt;/h2&gt;

&lt;p&gt;Most sites aren’t visible in AI search engines like ChatGPT, Perplexity, or Copilot.&lt;br&gt;&lt;br&gt;
Here’s how I fixed that for my startup (and how you can too).&lt;/p&gt;




&lt;h2&gt;
  
  
  Why This Matters
&lt;/h2&gt;

&lt;p&gt;All signs indicate that AI search will surpass Google search in roughly two years. It’s catching on fast, so starting to care now can give you a small but meaningful edge over your competitors — especially since, as you’ll see, it requires relatively low effort for potentially high ROI.&lt;/p&gt;

&lt;p&gt;First, ChatGPT is about 50% owned by Microsoft. So it’s no surprise that ChatGPT uses Bing exclusively to search and browse online. This means if your website is well indexed by Google but not by Bing, you’re missing out on a lot of potential visibility in AI search tools such as:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;ChatGPT&lt;/li&gt;
&lt;li&gt;Perplexity&lt;/li&gt;
&lt;li&gt;Copilot&lt;/li&gt;
&lt;li&gt;Any AI tool that taps into Bing’s data&lt;/li&gt;
&lt;/ul&gt;




&lt;h2&gt;
  
  
  How to Fix It in 2 Simple Steps
&lt;/h2&gt;

&lt;h3&gt;
  
  
  1. Submit Your Site to Bing Webmaster Tools
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Go to &lt;a href="https://bing.com/webmasters" rel="noopener noreferrer"&gt;bing.com/webmasters&lt;/a&gt;, Bing’s version of Google Search Console.&lt;/li&gt;
&lt;li&gt;You can even import your existing site from Google Search Console with one click.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;This step officially links your site to Bing. Done.&lt;/p&gt;

&lt;h3&gt;
  
  
  2. Submit Your Sitemap
&lt;/h3&gt;

&lt;ul&gt;
&lt;li&gt;Once set up, simply submit your sitemap (&lt;code&gt;sitemap.xml&lt;/code&gt;). This tells Bing to start indexing all your pages.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;And voilà, you’re now visible to Bing, which means ChatGPT and other AI search tools can access your content. Your site can now potentially reach hundreds of millions of users.&lt;/p&gt;




&lt;h2&gt;
  
  
  BONUS: Automate Instant Indexing with IndexNow
&lt;/h2&gt;

&lt;p&gt;In a perfect world, every time you publish a new page or update your site, it would get indexed instantly.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;With Google, you have to manually request it.&lt;/li&gt;
&lt;li&gt;With Bing, you can automate it using the IndexNow protocol.&lt;/li&gt;
&lt;/ul&gt;

&lt;p&gt;IndexNow is an official standard, so there’s no downside or risk of being flagged as spam.&lt;br&gt;&lt;br&gt;
You’re simply letting Bing know: &lt;strong&gt;“Hey, I added this page.”&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;IndexNow is supported by:&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Bing&lt;/li&gt;
&lt;li&gt;Yandex (mainly in Russia)&lt;/li&gt;
&lt;li&gt;Naver (mainly in South Korea)&lt;/li&gt;
&lt;/ul&gt;

&lt;h3&gt;
  
  
  How to Set It Up
&lt;/h3&gt;

&lt;p&gt;&lt;strong&gt;Option 1: Use a Platform&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
If you’re using Webflow, Shopify, or similar platforms, they support IndexNow either natively or via plugins.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 2: Use Ahrefs&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
Ahrefs supports IndexNow through their Boost upgrade (~$20/month extra).&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Option 3: Automate It with Code (What I Did)&lt;/strong&gt;&lt;br&gt;&lt;br&gt;
For &lt;a href="https://www.aithumbnail.so" rel="noopener noreferrer"&gt;AIThumbnail.so&lt;/a&gt;, I automated the entire process using GitHub Actions. Here's how:&lt;/p&gt;

&lt;ol&gt;
&lt;li&gt;Generate your IndexNow API key &lt;a href="https://www.bing.com/indexnow" rel="noopener noreferrer"&gt;here&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Use the &lt;a href="https://github.com/bojieyan/indexnow-action" rel="noopener noreferrer"&gt;bojieyan GitHub Action&lt;/a&gt; to run daily:

&lt;ul&gt;
&lt;li&gt;Every day at 2am, it checks my &lt;code&gt;sitemap.xml&lt;/code&gt; for pages with a recent &lt;code&gt;&amp;lt;lastmod&amp;gt;&lt;/code&gt; tag.&lt;/li&gt;
&lt;li&gt;If a page was updated in the last 24 hours, it submits it via the IndexNow API.&lt;/li&gt;
&lt;/ul&gt;
&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;That’s it. Now every new or updated page is immediately pushed to Bing — no manual work, no delay.&lt;/p&gt;




&lt;h2&gt;
  
  
  Startup is just a numbers game
&lt;/h2&gt;

&lt;p&gt;If this helped, give me a few claps and follow me. I’ll share more real lessons from building and growing my startup week by week.&lt;/p&gt;

&lt;p&gt;&lt;strong&gt;Thanks for reading!&lt;/strong&gt;&lt;/p&gt;

</description>
      <category>seo</category>
      <category>chatgpt</category>
      <category>githubcopilot</category>
      <category>indexnow</category>
    </item>
    <item>
      <title>How to Scrape Login-Protected Websites with Selenium (Step by Step Guide)</title>
      <dc:creator>Sacha Dumay</dc:creator>
      <pubDate>Wed, 30 Oct 2024 09:36:05 +0000</pubDate>
      <link>https://dev.to/dumay_sacha/how-to-scrape-login-protected-websites-with-selenium-step-by-step-guide-9cm</link>
      <guid>https://dev.to/dumay_sacha/how-to-scrape-login-protected-websites-with-selenium-step-by-step-guide-9cm</guid>
      <description>&lt;h2&gt;
  
  
  My Steps to Scrape a Password-Protected Website:
&lt;/h2&gt;

&lt;ol&gt;
&lt;li&gt;Capture the HTML form elements: username ID, password ID, and login button class&lt;/li&gt;
&lt;li&gt;- Use a tool like requests or Selenium to automate the login: fill username, wait, fill password, wait, click login&lt;/li&gt;
&lt;li&gt;- Store session cookies for authentication&lt;/li&gt;
&lt;li&gt;- Continue scraping the authenticated pages&lt;/li&gt;
&lt;/ol&gt;

&lt;p&gt;&lt;em&gt;Disclaimer: I’ve built an API for this specific use case at &lt;a href="https://www.scrapewebapp.com/" rel="noopener noreferrer"&gt;https://www.scrapewebapp.com/&lt;/a&gt;. So if you want to just get it done fast, use it, otherwise read on.&lt;/em&gt;&lt;/p&gt;

&lt;p&gt;Let’s use this example: let’s say I want to scrape my own API key from my account at &lt;a href="https://www.scrapewebapp.com/" rel="noopener noreferrer"&gt;https://www.scrapewebapp.com/&lt;/a&gt;. It is on this page: &lt;a href="https://app.scrapewebapp.com/account/api_key" rel="noopener noreferrer"&gt;https://app.scrapewebapp.com/account/api_key&lt;/a&gt;&lt;/p&gt;

&lt;h2&gt;
  
  
  1. The Login Page
&lt;/h2&gt;

&lt;p&gt;First, you need to find the login page. Most websites will give you a redirect 303 if you try to access a page behind login, so if you try to scrape directly &lt;a href="https://app.scrapewebapp.com/account/api_key" rel="noopener noreferrer"&gt;https://app.scrapewebapp.com/account/api_key&lt;/a&gt;, you will automatically get the login page &lt;a href="https://app.scrapewebapp.com/login" rel="noopener noreferrer"&gt;https://app.scrapewebapp.com/login&lt;/a&gt;. So it is a good way to automate finding the login page if not provided already.&lt;/p&gt;

&lt;p&gt;Ok, now that we have the login page, we need to find the place to add username or email as well as password and the actual sign-in button. The best way is to create a simple script that finds the ID of the inputs using their type “email”, “username”, “password” and finds the button with the type “submit”. I made a code for you below:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from bs4 import BeautifulSoup


def extract_login_form(html_content: str):
    """
    Extracts the login form elements from the given HTML content and returns their CSS selectors.
    """
    soup = BeautifulSoup(html_content, "html.parser")

    # Finding the username/email field
    username_email = (
        soup.find("input", {"type": "email"})
        or soup.find("input", {"name": "username"})
        or soup.find("input", {"type": "text"})
    )  # Fallback to input type text if no email type is found

    # Finding the password field
    password = soup.find("input", {"type": "password"})

    # Finding the login button
    # Searching for buttons/input of type submit closest to the password or username field
    login_button = None

    # First try to find a submit button within the same form
    if password:
        form = password.find_parent("form")
        if form:
            login_button = form.find("button", {"type": "submit"}) or form.find(
                "input", {"type": "submit"}
            )
    # If no button is found in the form, fall back to finding any submit button
    if not login_button:
        login_button = soup.find("button", {"type": "submit"}) or soup.find(
            "input", {"type": "submit"}
        )

    # Extracting CSS selectors
    def generate_css_selector(element, element_type):
        if "id" in element.attrs:
            return f"#{element['id']}"
        elif "type" in element.attrs:
            return f"{element_type}[type='{element['type']}']"
        else:
            return element_type

    # Generate CSS selectors with the updated logic
    username_email_css_selector = None
    if username_email:
        username_email_css_selector = generate_css_selector(username_email, "input")

    password_css_selector = None
    if password:
        password_css_selector = generate_css_selector(password, "input")

    login_button_css_selector = None
    if login_button:
        login_button_css_selector = generate_css_selector(
            login_button, "button" if login_button.name == "button" else "input"
        )

    return username_email_css_selector, password_css_selector, login_button_css_selector


def main(html_content: str):
    # Call the extract_login_form function and return its result
    return extract_login_form(html_content)
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;&lt;strong&gt;2. Using Selenium to Actually Log In&lt;/strong&gt;&lt;/p&gt;

&lt;p&gt;Now you need to create a selenium webdriver. We will use chrome headless to run it with Python. This is how to install it:&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Install selenium and chromium

!pip install selenium
!apt-get update 
!apt install chromium-chromedriver

!cp /usr/lib/chromium-browser/chromedriver /usr/bin
import sys
sys.path.insert(0,'/usr/lib/chromium-browser/chromedriver')
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Then actually log into our website and save the cookies. We will save all cookies, but you could only save the auth cookies if you wanted.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;# Imports
from selenium import webdriver
from selenium.webdriver.common.by import By
import requests
import time

# Set up Chrome options
chrome_options = webdriver.ChromeOptions()
chrome_options.add_argument('--headless')
chrome_options.add_argument('--no-sandbox')
chrome_options.add_argument('--disable-dev-shm-usage')

# Initialize the WebDriver
driver = webdriver.Chrome(options=chrome_options)

try:
    # Open the login page
    driver.get("https://app.scrapewebapp.com/login")

    # Find the email input field by ID and input your email
    email_input = driver.find_element(By.ID, "email")
    email_input.send_keys("******@gmail.com")

    # Find the password input field by ID and input your password
    password_input = driver.find_element(By.ID, "password")
    password_input.send_keys("*******")

    # Find the login button and submit the form
    login_button = driver.find_element(By.CSS_SELECTOR, "button[type='submit']")
    login_button.click()

    # Wait for the login process to complete
    time.sleep(5)  # Adjust this depending on your site's response time


finally:
    # Close the browser
    driver.quit()
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  3. Store Cookies
&lt;/h2&gt;

&lt;p&gt;It is as simple as saving them into a dictionary from the driver.getcookies() function.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def save_cookies(driver):
    """Save cookies from the Selenium WebDriver into a dictionary."""
    cookies = driver.get_cookies()
    cookie_dict = {}
    for cookie in cookies:
        cookie_dict[cookie['name']] = cookie['value']
    return cookie_dict
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;Save the cookies from the WebDriver&lt;/p&gt;

&lt;p&gt;cookies = save_cookies(driver)&lt;/p&gt;

&lt;h2&gt;
  
  
  4. Get Data from Our Logged-in Session
&lt;/h2&gt;

&lt;p&gt;In this part, we will use the simple library requests, but you could keep using selenium too.&lt;/p&gt;

&lt;p&gt;Now we want to get the actual API from this page: &lt;a href="https://app.scrapewebapp.com/account/api_key" rel="noopener noreferrer"&gt;https://app.scrapewebapp.com/account/api_key&lt;/a&gt;.&lt;/p&gt;

&lt;p&gt;So we create a session from the requests library and add each cookie into it. Then request the URL and print the response text.&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;def scrape_api_key(cookies):
    """Use cookies to scrape the /account/api_key page."""
    url = 'https://app.scrapewebapp.com/account/api_key'

    # Set up the session to persist cookies
    session = requests.Session()

    # Add cookies from Selenium to the requests session
    for name, value in cookies.items():
        session.cookies.set(name, value)

    # Make the request to the /account/api_key page
    response = session.get(url)

    # Check if the request is successful
    if response.status_code == 200:
        print("API Key page content:")
        print(response.text)  # Print the page content (could contain the API key)
    else:
        print(f"Failed to retrieve API key page, status code: {response.status_code}")
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;h2&gt;
  
  
  5. Get the Actual Data You Want (BONUS)
&lt;/h2&gt;

&lt;p&gt;We got the page text we wanted, but there is a lot of data that we do not care about. We just want the api_key.&lt;/p&gt;

&lt;p&gt;The best and easiest way to do that is to use AI like ChatGPT (GPT4o model).&lt;/p&gt;

&lt;p&gt;Prompt the model like this: “You are an expert scraper and you will extract only the information asked from the context. I need the value of my api-key from {context}”&lt;br&gt;
&lt;/p&gt;

&lt;div class="highlight js-code-highlight"&gt;
&lt;pre class="highlight plaintext"&gt;&lt;code&gt;from openai import OpenAI

# Initialize OpenAI client
client = OpenAI(api_key="sk-proj-***********")

def extract_api_key_using_ai(response_text):
    """Use OpenAI's GPT model to extract the API key."""
    prompt = f"""
    You are an expert scraper, and you will extract only the information asked from the context. 
    I need the value of my api-key from the following context:

    {response_text}
    """

    try:
        # Use OpenAI client to create a chat completion
        chat_completion = client.chat.completions.create(
            messages=[
                {
                    "role": "user",
                    "content": prompt,
                }
            ],
            model="gpt-4o",
        )

        # Extract the response from the AI
        extracted_api_key = chat_completion.choices[0].message.content
        return extracted_api_key

    except Exception as e:
        print(f"An error occurred with OpenAI API: {e}")
        return None
&lt;/code&gt;&lt;/pre&gt;

&lt;/div&gt;



&lt;p&gt;If you want all that in a simple and reliable API, please give a try to my new product &lt;a href="https://www.scrapewebapp.com/" rel="noopener noreferrer"&gt;https://www.scrapewebapp.com/&lt;/a&gt;&lt;/p&gt;

&lt;p&gt;If you like this post, please give me claps and follow me. It does help a lot!&lt;/p&gt;

</description>
      <category>webscraping</category>
      <category>ai</category>
      <category>python</category>
      <category>selenium</category>
    </item>
  </channel>
</rss>
