Srav Nayani

Posted on Oct 8

AI Agent Building Block: Web Automation

#ai #automation #agents

Code for this article is available at https://github.com/shravyanayani/automation

What is Artificial Intelligence

Artificial Intelligence (AI) refers to a computer system's ability to perform tasks that usually need human intelligence. These tasks include learning, reasoning, problem-solving, and understanding language.

The key components of AI are data, algorithms, and models.

Data provides the examples or information that AI learns from. Algorithms are the step-by-step methods that process this data to find patterns or make decisions.
The model is the outcome of the trained algorithms. It uses what it has learned from the data to predict or act on new inputs.
These components work together so AI systems can keep improving their performance and make smart decisions in real-world situations.

What is AI Agent

An AI agent is a system that can sense its environment, make decisions, and act to reach specific goals, often without ongoing human help.

The key components of an AI agent include the perception module, decision-making module, and action module.

The perception module collects information from the environment using sensors or data inputs and makes sense of it.
The decision-making module uses algorithms or models to select the best action based on goals, rules, or past experiences.
Finally, the action module executes those decisions, and the agent continuously repeats this cycle to learn and improve over time.

What is Web Automation

Web automation uses software or scripts to automatically perform tasks on websites. This includes filling out forms, clicking buttons, scraping data, or testing web applications without needing manual effort.

The key components of web automation are the web driver or automation tool, like Selenium, the scripts or test code, and the browser interface. The scripts instruct the automation tool on which actions to take. The automation tool then controls the browser to carry out those actions, while the browser interface shows and responds like a human user would.

These components enable developers to test, monitor, or interact with websites effectively and reliably.

Web Automation vs API

API (Application Programming Interface) integration lets an AI agent interact directly with an external system’s backend through structured requests. API integration is quick, efficient, and dependable. However, an API must exist for all external system integrations. Also, API access must be granted for the AI Agent.

Web Automation depends on UI based user functionality, so setting up the API interface is not necessary. However, there are some drawbacks to Web Automation, such as occasional unreliability, slowness, and websites disabling the automation etc.

Web Automation vs Native App Automation

Web automation and native app automation involve using software to automatically test or interact with applications, but they target different platforms and use different tools.

Web automation focuses on automating actions in web browsers, such as Chrome or Edge. It uses tools like Selenium or Playwright. This method interacts with web elements, including HTML, CSS, and JavaScript, through a web driver. It works across various browsers and operating systems.

Native app automation, in contrast, targets mobile or desktop applications specifically designed for platforms like Android, iOS, or Windows. It employs tools like Appium, Espresso, or XCUITest, which communicate directly with the app’s native UI components instead of going through a browser.

Web automation relies on the Web Browser's DOM (Document Object Model) while native app automation relies on UI elements defined by the operating system.

In short, web automation tests websites, while native app automation tests standalone apps. Both ensure that software functions correctly in their respective environments.

How Web Automation can be a building block for AI Agents

Web automation can be one of the most powerful features of AI systems, as it basically allows them to perform actions on the web and interact with it much in the same way as a human user would do.

By the means of web automation, an AI agent can simply navigate through the sites, collect data, fill in the forms, or start a certain process — thus giving it the ability to access the up-to-date information and perform the online tasks without any human intervention.

The automation layer is the one that physically does the clicking, typing, or scraping while the AI layer gives the intelligence – by deciding what to do, why, when, based on objectives or learned patterns.

As an illustration, an AI agent may employ natural language understanding to get an idea of a request (“book a flight to Austin”) and then, through web automation, it goes to the respective travel websites, compares the prices, and makes the booking.

Ultimately, the combination of AI-driven decision-making and web automation execution gives agents the power to move seamlessly between thought and deed, thus implementing the smart insights to the world.

Technology Choices for Web Automation

Technology options for the purpose of web automation are primarily dependent on the to-be-executed tasks, the platforms to be targeted, and the degree of the intelligence or scalability desired.

Programming languages, support tools, and automation frameworks are the main classes of technologies that feature.

Automation Frameworks – Tools of this kind such as Selenium, Playwright, Cypress, and Puppeteer are the most cited ones.

Selenium is compatible with different browsers and several languages (Java, Python, C#) besides.
Playwright and Puppeteer are a bit quicker , more recent alternatives, where parallel testing and headless browsing are automatically supported.
Cypress is the best choice for front-end developers using modern JavaScript frameworks.

Programming Languages – The usual picks are Python, Java, JavaScript, and C# and the choice depends on the proficiency of the development team and integration requirements.

Supporting Tools – The use of some scheduler or some trigger to run the automation conditionally for smart maintenance is common in the frameworks which most often integrate.

These are not competing technologies but complementary ones — the language instructs the logic, the framework interacts with the browser, and the tools allow for integration with the environment.

Challenges with Web Automation

Here are a few key challenges with web automation:

Dynamic Web Elements – Recently, JavaScript or AJAX are often used to update websites content, which automatically changes element IDs or structures and breaks automation scripts.
Browser Compatibility – Every browser has a different way of rendering the same page that is almost negligible but still a bit different, thus scripts need to be tested in different browsers to be sure that they work consistently there.
Synchronization Issues – The "element not found" errors may appear if the elements are not loaded even a fraction of a second earlier than the script so proper waiting or timing control should be used.
Maintenance Overhead – In the situation when a website layout or functionality has changed then there is a necessity for the test scripts to be updated first before the tests can run.
Authentication and Security Barriers – A few examples of the problematic issues that can arise automation due to the introduction of new security features such as captchas, multi-factor authentication, rate limits etc.
Scalability and Performance – The large-scale automation process (e.g., running parallel tests) can require a lot of resources and a well-thought-out infrastructure.
Handling Non-Standard Elements – Just like regular UI components, complex ones can also be hard to automate. These components are canvas, pop-up, drag-and-drop, etc.

These challenges make web automation challenging sometimes but thoughtful design, robust frameworks, and continuous maintenance makes it powerful.

Coding sample Web Automation

Following is the Selenium web automation code written in Python language to search for flights on a travel site. This is written just for educational purposes. This travel website cannot be used for real usage, because the websites are generally protected by no-bot usage policy, that will immediately come into picture with a captcha or a puzzle asking to prove that a human is using the site. Web Automation cannot pass this hurdle, so the automation script cannot be used without human supervision.

from selenium import webdriver
from selenium.webdriver.common.by import By
from selenium.webdriver.support.ui import WebDriverWait
from selenium.webdriver.support import expected_conditions as EC
from selenium.webdriver.chrome.options import Options
from datetime import datetime
import time

# Configuration
ORIGIN = "New York"  # Or airport code like "JFK"
DESTINATION = "Los Angeles"  # Or "LAX"
DEPARTURE_DATE = "15/10/2025"  # Format: DD/MM/YYYY (adjust based on site)

def setup_driver():
    """Set up Chrome driver in headless mode."""
    options = Options()
    options.add_argument("--headless")  # Run without UI
    options.add_argument("--no-sandbox")
    options.add_argument("--disable-dev-shm-usage")
    driver = webdriver.Chrome(options=options)
    return driver

def search_flights(driver):
    """Perform flight search."""
    wait = WebDriverWait(driver, 10)

    # Step 1: Navigate to Skyscanner
    driver.get("https://www.skyscanner.com/")
    time.sleep(2)  # Allow page load

    # Step 2: Select one-way trip (click if needed; Skyscanner defaults to round-trip, so toggle)
    try:
        one_way_button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[data-testid="trip-type-selector-one-way"]')))
        one_way_button.click()
        time.sleep(1)
    except:
        print("One-way button not found; assuming default.")

    # Step 3: Enter origin
    origin_input = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '[data-testid="origin-input"] input')))
    origin_input.clear()
    origin_input.send_keys(ORIGIN)
    time.sleep(1)

    # Click suggestion if appears (e.g., JFK)
    try:
        origin_suggestion = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[data-testid="suggestion-card"]')))
        origin_suggestion.click()
        time.sleep(1)
    except:
        print("No origin suggestion; proceeding.")

    # Step 4: Enter destination
    dest_input = driver.find_element(By.CSS_SELECTOR, '[data-testid="destination-input"] input')
    dest_input.clear()
    dest_input.send_keys(DESTINATION)
    time.sleep(1)

    # Click suggestion (e.g., LAX)
    try:
        dest_suggestion = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[data-testid="suggestion-card"]')))
        dest_suggestion.click()
        time.sleep(1)
    except:
        print("No destination suggestion; proceeding.")

    # Step 5: Enter departure date
    date_input = driver.find_element(By.CSS_SELECTOR, '[data-testid="date-picker"] input')
    date_input.clear()
    date_input.send_keys(DEPARTURE_DATE)
    time.sleep(1)

    # Select the date from calendar if pops up
    try:
        date_picker = wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '[data-testid="date-picker-month"]')))
        # Find and click the specific date (adapt XPath for day)
        specific_date = driver.find_element(By.XPATH, f"//td[@data-testid='day-15']")  # Adjust for month/year
        specific_date.click()
        time.sleep(1)
    except:
        print("Date input direct; calendar may not have triggered.")

    # Step 6: Click search button
    search_button = wait.until(EC.element_to_be_clickable((By.CSS_SELECTOR, '[data-testid="search-button"]')))
    search_button.click()
    time.sleep(5)  # Allow results to load

def extract_lowest_price(driver):
    """Extract flight prices and find the lowest."""
    wait = WebDriverWait(driver, 10)
    flights = []

    try:
        # Wait for results to load
        wait.until(EC.presence_of_element_located((By.CSS_SELECTOR, '[data-testid="flight-result"]')))

        # Extract all flight cards
        flight_cards = driver.find_elements(By.CSS_SELECTOR, '[data-testid="flight-result"]')

        for card in flight_cards[:10]:  # Limit to top 10 for brevity
            try:
                price_elem = card.find_element(By.CSS_SELECTOR, '[data-testid="price"]')
                price_text = price_elem.text.strip().replace('$', '').replace(',', '')
                if price_text.isdigit():
                    price = int(price_text)
                    airline = card.find_element(By.CSS_SELECTOR, '[data-testid="airline"]').text
                    flights.append({'airline': airline, 'price': price})
            except:
                continue

        if flights:
            lowest = min(flights, key=lambda x: x['price'])
            print(f"Lowest priced flight: {lowest['airline']} for ${lowest['price']}")
            return lowest
        else:
            print("No prices extracted.")
            return None
    except Exception as e:
        print(f"Error extracting prices: {e}")
        return None

# Main execution
if __name__ == "__main__":
    driver = setup_driver()
    try:
        search_flights(driver)
        lowest_flight = extract_lowest_price(driver)
        if lowest_flight:
            print(f"Found lowest flight: {lowest_flight}")
        else:
            print("No flights found or error in extraction.")
    finally:
        driver.quit()

Testing Web Automation

Testing web automation code is vital if you want to make sure that it functions consistently and is capable of dealing with the situations that might appear in the world.

Part of web automation testing can consist of the use of assertions for verification that expected elements have appeared, data is accurate, and navigational steps have been completed successfully.

Moreover, a wait condition (such as WebDriverWait) can be used for handling dynamic page loads instead of using fixed delays, which is also a very effective method.

Try your automated actions on various browsers and devices to check whether the performance is the same.

Integrating Web Automation with AI Agent

A script for web automation is excellent for handling repetitive and predictable tasks, such as clicking on buttons, filling out forms, moving through pages, or extracting data.

Conversely, an AI agent is capable of logical thinking, planning, learning, and making judgments based on available data.

When you combine these two, you get a smart system where the AI is in charge of the operations and the automation carries out the tasks.

Typical AI Agent System Components

AI Decision Engine - Processes goals, rules, or user commands and decides the sequence of actions. Can use ML models, NLP, or rule-based systems
Web Automation Layer - Executes low-level actions on web pages via tools like Selenium, Playwright, or Puppeteer. Handles clicks, inputs, scrolling, navigation.
Perception / Data Extraction Module - Observes the web environment and extracts relevant information (prices, flight options, stock data). Feeds this back to the AI agent.
Feedback / Learning Module - Evaluates outcomes of automated actions (e.g., did the AI pick the lowest flight price?) and updates decision-making models for future improvements.
Scheduler / Controller - Coordinates the flow: triggers web automation when needed, handles retries, logs progress, and ensures proper sequencing of tasks.

DEV Community