Johannes Dienst for AskUI

Posted on Oct 10, 2022 • Edited on Jun 16, 2023 • Originally published at askui.com

Challenges in User Interface Automation: Current State

#programming #testing

User Interface (UI) Automation is a hard task in todays world as the tools used are normally dependent on relying on structural clues instead of visual recognition of UI elements.

In this three part blog series we will show where the current challenges of UI Automation are, what solutions to them exist and how AI/ML can help solve them in the future.

What is a User Interface

A user interface is everything that a user uses to interact with a machine. In our current world it is usually visual with touch ability or buttons to click like a laptop keyboard.

Human vs Computer Vision

We humans have a knack for visual user interfaces as our brain is really good at finding patterns. Look at the following picture for example:

We instantly recognise that it is some kind of login form and there is a button which we can click/tap. It then should redirect us to a dashboard of sorts. You do not have to think about it thanks to our wonderful brain 🥳

This usually happens in a webbrowser, so let us assume that a computer is looking at the webpage that displays the login form above. What do you think a computer would see? Probably something like the following:

Example Document Object Model (DOM) tree visualised with http://bioub.github.io/dom-visualizer

The reason a computer only sees the Document Object Model (DOM) here is, that a computer does not have an idea or the means to see how the website looks visually. Therefor it lacks the ability for pattern matching. It analyses the structure of the webpage to understand what it sees in the browser!

But what is the DOM actually?

DOM - Document Object Model

The DOM is a structure that is usually represented as a tree with nodes. It is derived from HTML and has methods to traverse and access the nodes.

Shadow DOM

Webcomponents are a way to encapsulate a complete DOM-Tree with stylesheets into an existing DOM without the webcomponent affecting the DOM outside of it.

To achieve this it uses the Shadow DOM which can be traversed like a regular DOM.

For an in-depth explanation visit the mdn web docs_

Selector Based Automation

Instead of using a visual selector, the computer has to rely on two different techniques to find elements.

To show how that works we define a simple HTML page first and then discuss the two techniques used today to find elements in the resulting DOM:

<!DOCTYPE html>
<html lang="en">
  <head>
    <meta charset="utf-8">
    <title>A Basic example</title>
  </head>
  <body>
    <div class="container">
      <button id="loginButton" type="button" class="login">Login</button>
    </div>
  </body>
</html>

Select with XPATH

XPath was developed to find elements in XML. HTML having the same structure like XML is thus queryable by it.

Valid XPath to find the Login-button would be:

//button
//*[@id="loginButton"]
/html/body/div/button`

To get a better grip on XPath use the XPath Cheatsheet.

CSS Selectors

Cascading Style Sheets also use selectors and most frameworks support CSS selectors.

Valid CSS selectors for our button are:

#loginButton /* id selector */
.login /* class selector */
button /* type selector */

See this cheatsheet for more CSS selector goodness.

Play around with our Git Repository

Maybe you want to see the examples for yourself and play with the code. Do not worry, we got you covered with a Git Repository which contains our example HTML and a test with the popular Cypress testing framework.

describe('xpath and css selector example spec', () => {
  it('finds login button with xpath and css selectors', () => {
    cy.visit('../minimalExamplePage.html');

    cy.xpath('//button').should('be.visible');
    cy.xpath('//*[@id="loginButton"]').should('be.visible');
    cy.xpath('/html/body/div/button').should('be.visible');

    cy.get('#loginButton').should('be.visible');
    cy.get('.login').should('be.visible');
    cy.get('button').should('be.visible');
  })
})

Follow the steps in the README to startup Cypress and execute the test.

Conclusion

If you are to write automation for an application that runs in a browser, there are a plethora of tools available to choose from. We showed how cypress works exemplarily.

In the next installment of this series we look at the problems selector based approaches have and how current tools are trying to remedy them.

Do you want to know more about UI automation's future with askui? Join our Discord community!