DEV Community

Cover image for Selenium and AI: UI Validations with AI
vishalmysore
vishalmysore

Posted on

Selenium and AI: UI Validations with AI

Automated UI testing has long been a staple in ensuring that web applications meet design and functionality standards. Traditional methods, such as using Selenium, involve interacting with the UI through specific code tied to the structure of the webpage. However, a new AI-driven method using Tools4AI is changing the game by allowing us to translate visual elements directly into code objects, simplifying the testing process considerably.

In my previous article I wrote about how to function call based on images you can read about that here

The Traditional Selenium Way to validate UI

Lets take a simple example to start with, Imagine you’re testing a website for an auto repair shop.

Image description

With Selenium, you might write a test to check service prices:

WebElement oilChangePrice = driver.findElement(By.id("oilChangePrice"));
assert "29.99".equals(oilChangePrice.getText());
Enter fullscreen mode Exit fullscreen mode

Conventionally, if you needed to verify that the “Full Inspection” service on the AutoServicingPro website is listed as “Starting at $99.99,” you would:

Use Selenium to locate the web element.
Extract the text from the element.
Assert the value against your expected text.
This method works well but has its drawbacks. Any change to the website’s layout could break your test, requiring you to update the code.

The AI-Based Method: Option 1 With JSON

Incorporating AI to convert images to JSON is an innovative approach to streamline the data extraction process from user interfaces. This method bypasses the traditional need for manual element identification and data extraction, offering a more efficient solution.

First you would either have the screenshot as files or can use Selenium to get one

 WebDriverManager.chromedriver().setup();

        ChromeOptions options = new ChromeOptions();
        options.addArguments("--headless");  // Setting headless mode
        options.addArguments("--disable-gpu");  // GPU hardware acceleration isn't useful in headless mode
        options.addArguments("--window-size=1920,1080");  // Set the window size
        WebDriver driver = new ChromeDriver(options);


        driver.get("https://google.com");
        // Your code to interact with the page and take screenshots
      // Take screenshot and save it as file or use as bytes
        TakesScreenshot ts = (TakesScreenshot) driver;
        byte[] screenshotBytes = ts.getScreenshotAs(OutputType.BYTES);
        GeminiImageActionProcessor imageActionProcessor = new GeminiImageActionProcessor();
        imageActionProcessor.imageToText(screenshotBytes)
        //File srcFile = ts.getScreenshotAs(OutputType.FILE);
        //File destFile = new File("screenshot.png");
        //FileHandler.copy(srcFile, destFile);
        driver.quit();
Enter fullscreen mode Exit fullscreen mode

With the AI-based method, a tool processes the image of the website and returns a JSON string containing the necessary information:

String jsonStr = processor.imageToJson(
  GeminiImageExample.class.getClassLoader().getResource("auto.PNG"),
  "Full Inspection"
); // you can directy feed the bytes of the image
Enter fullscreen mode Exit fullscreen mode
// The returned JSON
System.out.println(jsonStr);
Enter fullscreen mode Exit fullscreen mode

The return Json looks like this

{
  "fieldName": "Full Inspection",
  "fieldType": "String",
  "fieldValue": "Starting at $99.99"
}
Enter fullscreen mode Exit fullscreen mode

Now this can be easily validated with any json parser. You could also have a “golden copy” of your json which you can compare against every time the UI changes

The AI-Based Method: Option 2 With Pojos

Tools4AI can help take screenshot of the page and identifies the elements for you, converting them into a POJO (Plain Old Java Object):

import lombok.*;
@Getter
@Setter
@ToString
@NoArgsConstructor
@AllArgsConstructor
public class AutoRepairScreen {
    double fullInspectionValue;
    double tireRotationValue;
    double oilChangeValue;
    Integer phoneNumber;
    String email;
    String[] customerReviews;
}
Enter fullscreen mode Exit fullscreen mode
AutoRepairScreen screenData = aiProcessor.imageToPojo("screenshot.png", AutoRepairScreen.class);
assert screenData.oilChangePrice == 29.99;
Enter fullscreen mode Exit fullscreen mode

This method not only saves time but also enhances the robustness of UI testing. Testers can focus on the logic and data rather than on the underlying code required to fetch this information. Furthermore, as UIs evolve, tests written with Tools4AI can adapt to changes with minimal to no maintenance, provided the AI continues to recognize the UI elements correctly.

Tools4AI exemplifies how AI can simplify complex processes, offering a glimpse into a future where AI and machine learning continually reduce the manual workload in software development and quality assurance.

Example 2

Now, let’s delve into a second example. Here we have a gym schedule. Someone snaps a photo or captures a screenshot from their webpage using selenium. We’re going to explore how this visual information transitions into a usable Java object or becomes validated through an AI-driven process.

Image description

Picture a typical gym timetable: rows of classes and times across a grid representing the week. Traditionally, if you needed to digitize or validate this schedule, it would be a manual process. You’d have to type out each class into a database or cross-reference times and offerings with a source of truth.

Enter the AI solution by Tools4AI. It simplifies the transition of data from a visual format into structured, validated Java objects.

GeminiImageActionProcessor processor = new GeminiImageActionProcessor();
Enter fullscreen mode Exit fullscreen mode

Here’s how it works:

Object pojo = processor.imageToPojo(GeminiImageExample.class.getClassLoader().getResource("fitness.PNG"), MyGymSchedule.class);
log.info(pojo.toString());
Enter fullscreen mode Exit fullscreen mode

The Pojo looks like this

@Getter
@Setter
@NoArgsConstructor
@ToString
public class MyGymSchedule {
  @ListType(Activity.class)
  List<Activity> myWeeklyActivity;

}

@Getter
@Setter
@NoArgsConstructor
@ToString
public class Activity {
    String dayOfTheWeek;
    String activityName;
}
Enter fullscreen mode Exit fullscreen mode

Image to Data Interpretation: First, the image of the gym schedule is processed. AI algorithms examine the picture, detect the text and its organization — understanding that “Monday at 7 PM” corresponds to “Yoga,” for example.

Data Mapping: The interpreted data is then mapped to Java objects. In this case, the AI populates a MyGymSchedule object that contains a List. Each Activity object holds the dayOfTheWeek and activityName, reflecting the schedule.

Validation and Use: If this process is part of a test, the populated Java objects can now be easily validated. Does the MyGymSchedule object contain "Yoga" under "Monday"? If the AI's interpretation aligns with the expected data, the test passes.
The result

INFO: MyGymSchedule(myWeeklyActivity=[Activity(dayOfTheWeek=Monday, 
activityName=LES MILLS VIRTUAL RPM), Activity(dayOfTheWeek=Monday, 
activityName=VIRTUAL NEWBODY), Activity(dayOfTheWeek=Tuesday, 
activityName=VIRTUAL NEWBODY), Activity(dayOfTheWeek=Wednesday, 
activityName=VIRTUAL ATHLETIC RIDE), Activity(dayOfTheWeek=Thursday,
 activityName=VIRTUAL CYCLING), Activity(dayOfTheWeek=Friday, 
activityName=VIRTUAL AWESOME ABS), Activity(dayOfTheWeek=Friday, 
activityName=VIRTUAL NEWBODY), Activity(dayOfTheWeek=Saturday, 
activityName=AQUAFIT), Activity(dayOfTheWeek=Sunday, activityName=AQUAFIT)])
Enter fullscreen mode Exit fullscreen mode

The real beauty of this approach is its versatility. Whether you’re dealing with a photo or a screenshot from a website, the method remains the same. The result is a quick and accurate digitization or validation of the information that was once locked in a static image, now dynamically accessible and usable within any system that can handle Java objects.

 public static void main(String[] args) throws AIProcessingException {
        GeminiImageActionProcessor processor = new GeminiImageActionProcessor();
        String jsonStr = processor.imageToJson(GeminiImageExample.class.getClassLoader().getResource("images/auto.PNG"),"Full Inspection");
        log.info(jsonStr);
        jsonStr = processor.imageToJson(GeminiImageExample.class.getClassLoader().getResource("images/auto.PNG"),"Full Inspection","Tire Rotation","Oil Change");
        log.info(jsonStr);
        jsonStr = processor.imageToJson(GeminiImageExample.class.getClassLoader().getResource("images/auto.PNG"), AutoRepairScreen.class);
        log.info(jsonStr);
        jsonStr = processor.imageToJson(GeminiImageExample.class.getClassLoader().getResource("images/fitness.PNG"), MyGymSchedule.class);
        log.info(jsonStr);
        Object pojo = processor.imageToPojo(GeminiImageExample.class.getClassLoader().getResource("images/fitness.PNG"), MyGymSchedule.class);
        log.info(pojo.toString());
        pojo = processor.imageToPojo(GeminiImageExample.class.getClassLoader().getResource("images/auto.PNG"), AutoRepairScreen.class);
        log.info(pojo.toString());
    }
Enter fullscreen mode Exit fullscreen mode

Complete code for this article is available here

Conclusion

Testing user interfaces traditionally involves a significant amount of manual scriptwriting, often requiring tedious element location and extraction. Tools4AI revolutionizes this process, utilizing AI to convert UI screens into JSON objects or Java POJOs (Plain Old Java Objects). This transformation facilitates a more streamlined approach to interacting with and validating UI data.

With Tools4AI’s image processing capabilities, extracting data from a UI becomes as simple as feeding a screenshot into the system. The AI then analyzes the image, identifies text and UI components, and converts this information into a structured JSON or a Java object.

Top comments (0)