Steven Lemon

Posted on Aug 30, 2019

Our team's trouble with hand-written automated UI tests

#testing #productivity

Before you can release a new feature, you need to make sure that your existing features still work. You give each release to the QA team to perform manual regression testing. The testers/QA team have their scripts and spend a couple of days stepping through them on the hunt for regressions and bugs. Over time, you add new features, the scripts grow in size, and so does the time it takes to perform manual tests. Your reliance on manual testing starts to become problematic, and so you start looking for alternatives. Automated UI testing sounds appealing. It seems to promise that you can keep running your same regression test scripts, but replace the hands and eyes of a human with those of an automation framework.

Everyone starts to get really excited about automated UI testing.

Manual regression testing is a tedious task that everyone is happy to see replaced.
It frees up the QA team's time for ad-hoc and exploratory testing.
When the manual regression testing step takes so much time to complete, small delays can put your release at risk. Perhaps testing needs to be restarted, or the start time is pushed back a few days, or your regression environment needs to share two different releases at the same time.
Your release cadence is limited by manual regression testing. Two or more days of manual regression testing means you can, at best, release twice a month. Moreover, you'll need to release everything in one go. It's all or nothing, since you need to test everything together.
Automated tests are tangible. You can have them running on devices on your wall and show them off to visitors.
Automation means that regression testing can happen as you develop, inside your sprint, reducing needing to throw work over a wall and wait days for the results.

You could purchase a commercial tool that helps you create and manage your tests. Or, perhaps your framework of choice comes with a built-in automation solution. Great, this article isn't for you. Alternatively, you might be considering using tooling like Selenium or Appium to hand-write your tests. This is the approach my team was given, and after several months of work, we abandoned the tests. They had not proven to be a good fit for our test suites, our architecture, our team, or our expectations. Through this process, we learned many lessons and encountered many problems that should have been considered upfront.

Does it fit your manual regression suite?

Be realistic about what automated testing can cover, it won't be your entire manual regression suite. Some parts are going to be too complicated or too time-consuming to be worthwhile automating.

Long chains of actions that cannot be split up. The unreliability of UI tests can make it challenging to get all the way through in one run.
Testing interaction with a second application.
Checking the output of pdf and other generated files.
Testing tasks that interact with Windows or the Windows file system.
Tests where subsequent runs will have different results. Will your test run be affected by the results of previous test runs? Manual regression might happen once a fortnight, while automated UI tests might retry the same test multiple times a minute or hour, increasing your chance of collisions.
Tests where your application could be left in an inconsistent state if the test run fails or crashes halfway. Would this need human intervention to remedy?
Where you don't have sufficient control over the data being displayed in a section of the app, making it difficult to set up test preconditions. Do the testers have to hunt around the app looking for matching data, rather than being able to create or directly navigate to that scenario?

Be careful about becoming overly dogmatic, forcing UI automation tests where they don't fit. Not only will they be hard to write, they will end up unreliable and difficult to maintain. Be realistic about what can be automated before you start. Whoever is automating the tests needs the freedom to say no.

Does it fit your application's architecture?

Depending on how your application is structured and how it has grown, you might find automation takes an unreasonable amount of time to set up.
UI automation is one part writing the test steps and one part setting up the test infrastructure. If you follow the Page Object Model pattern, then for each page and control in your application, you create models so your tests can find and interact with the elements on that page or control. The amount of infrastructure code you need to write depends on the project. Do you have a few different pages taking many different inputs, or many workflows spread across a lot of specialized pages? Do you have a small library of controls that you reuse, or is every control bespoke as your UI has changed over time? How you've developed your application up until this point determines how much effort you need to put into writing the test infrastructure. In turn, this impacts how long it takes to write your automated UI tests.

Are your automated tests going to find bugs and regressions?

Before you start, check whether automated UI tests are going to find the regressions that you expect. You should have a record of bugs previously found during regression and in production. How many of them do you plan to catch with automated UI tests?

There are many cases that UI tests won't find.

Issues in steps that are not included in the paths laid out in your manual regression scripts. How often are bugs found because someone is testing that area of the app, rather than being explicitly included in a test step?
When both a feature and its tests are incorrect.
Bugs in edge cases or uncommon scenarios.
Anything that is caught by your unit and integration test layers.
Any action whose result isn't visible in your application. Avoid trying to hide data in your application just for your automation tests.
Visual errors.
Performance problems.
Any test cases that end up being too complicated and challenging to automate.

What roles are automated tests are going to have in your process? How can they best support your QA team and regression process? Perhaps, rather than finding bugs, you aim to free up QA time. You could skip the areas that QA covers when performing more exhaustive ad-hoc and exploratory testing. What you expect automated UI tests to find, should inform what areas you choose to include and how many you plan to write.

How much are you expecting to spend?

It is easy to compare the time you spend writing tests compared to the time you might save. Automating one of our features took over 200 hours to save 20 minutes each release. Given that automated tests cost much more time than they save, are the benefits you envision gaining worth it? Are they going to take so long to create that you will never get all of the way through creating the test suite?

Who will write the tests?

You might hope that by using the Page Object Model pattern, the developers can write the test infrastructure which QA can then use to write the tests. Our experience didn't pan out that way, with the developers needing to write both the infrastructure and the tests.

Your test infrastructure might not be reusable across multiple tests. Without reuse, you end up writing the support code at the same time you write the tests.
Writing the tests might also require many updates to your application.
The automation framework doesn't provide enough information to know whether the test failed because of the infrastructure or the tests.
If your QA team lacks experience with coding or automation, you might not be able to make the framework simple enough to use.
The tests require too much knowledge of the internals of the application.
Test flakiness causes the developers to keep returning to fix up the test infrastructure.

As you are working on your proofs of concept for automated tests, involve whoever is intended to extend and maintain the tests. Ensure that what you are making is appropriate for their skill set and understanding of your application.

Do you have a clean dataset to test against?

When you start, you might use the same database that you use for your regular development activities. However, before long, you will yourself spending more and more time working around your dataset.

You need to find preconditions rather than having them already set up, or they should need to be easy to create.
Your UI changes as more data is created. For example, extra data pushes an element off the page, and tests fail because they cannot interact with it.
The same test might be rerun within the same minute. You need to check whether an element is from the current test or a previous test run.
Tests fail because a test user has entered an unexpected state, requiring either manual intervention or the tests pre-filtering users in each invalid state.
Sweeping changes to your dataset change the data you had been working with. For example, you might periodically clear your developer database or refresh it with data imported from another system.
Simultaneous test runs or developers using the same database lead to unexpected interactions and test failures.
The temptation sneaks in to be able to run the tests against multiple environments. Against the dev database during development and the regression environment during signoff.

Each of your tests may have various preconditions that need to be set up. Relying on the automated tests to set up their preconditions will turn each test into a long chain of actions. Not only will these extra steps make the test slower to write and run, but they will make them flakier, and make it harder to track down the failure points. What if you can't create your test scenario's preconditions from within your app? Do the tests need to hunt around the app looking for appropriate data?
With a clean dataset, you can have known test conditions and known test users, similar to how you use an object mother in unit tests.

You want a database that can be reset and populated with fixed data. If you don't already have this, then you will require a lot of new infrastructure: a new database, a tool for populating valid test data, APIs that point to this new database, and build pipelines for deploying this environment.

Your UI framework and components are doing more than you realize

Each of your tests is going to need to account for everything that each UI component can do.
Take the following example where we don't reset the database between tests, and we want to click on an element in a list.

We click on the element; the test passes.
Subsequent runs add more items to the list, pushing the target offscreen; we need to update our tests to jump to the element before clicking.
The list then gets so long that UI virtualization kicks in. Our target no longer exists on the page. We can't jump to it and instead need search through the list by slowly scrolling through it.
Duplicates appear in the list; you need to figure out which element is from the current test run.
Another element grows in size, pushing the entire list off-screen; you need to scroll to the list before interacting with it.
A previous test run failed to complete, and the test entity is left in a state that hides the entire list.

Each UI control in your application will require a similar process. You might find yourself revisiting tests for weeks after you create them as you hit edge cases in your UI controls that you hadn't expected.

Flakiness

Automated tests fail frequently, and often, you're not going to know why.
Failures can happen for many reasons

The automation framework you are using fails to find an element onscreen.
The automation framework fails to recognise that your application has started.
The test driver fails to connect.
You encounter an edge case in a UI component.
An element is pushed off-screen so your automation framework cannot interact with it.
Timing issues: perhaps a mask doesn't quite finish hiding before the test attempts to click an element.
The tests work differently at different screen sizes and resolutions as different elements are on or off the screen.
All of the issues mentioned previously with not having a clean, isolated database instance for each test run.

We were using Appium and WinAppDriver, and for most of the failures, we were given no useful error message, no logs and no stack traces. We had tests failing because an element couldn't be found, but no way of telling which element was at fault. Worse, since the failures were intermittent, and could be device or environment specific, it took a long time to determine the cause.

One solution to flaky tests is to run each test until it passes. This poses several problems: the duration of your test runs gets longer, making it harder to get timely feedback from your tests. Secondly, it makes it harder to write new tests, and you might be waiting ten minutes or more to test a single change. Ideally, you would address flakiness whenever it increases. Track test flakiness over time, and group the vague error messages you receive. Knowing when flakiness started can be an essential clue to tracking it down when you don't have useful logs.

To tackle flakiness, we resorted to maintaining a long list of everything that could cause flakiness — all of the edge cases and UI interactions between our test suite and our application. Not only did creating this take a long time and a lot of trial and error, but it also increased the learning curve for sharing the test suite with other developers.

Refactoring is hard

Automated UI tests are difficult to refactor. The tests can take hours to run, making it hard to get feedback for sweeping changes. Some tests might be heavily reliant on carefully arranging timings and break as soon as anything is changed.
With automated testing likely being new to your team, you risk ending up with many different approaches as the developers try to figure out the best strategies and then struggle to apply them to the test cases. Having different approaches makes it hard for new people coming onto the project to tell which is the best approach. It also has consequences when you make any changes to your application's UI. You might find yourself need to change dozens of automated UI tests, each with a different implementation.

Human factors

When bringing a new tool, technology or process into a team, there are a variety of human factors to consider:

What is the quality of life of using the new tool? Is it frustrating or slow?
Is there someone available and willing to be a champion for the new technology? Who takes over if they leave?
What happens when the tool causes delays? Are automated tests going to get dropped when you run out of time? How much extra time will the business tolerate automated UI tests adding to a feature?
What happens if the tool gains a bad reputation amongst the team?
Is everyone on board with the value of writing automated tests, or do they believe it is a waste of time?

As we have covered so far, there are a lot of potential pain points and many questions regarding the value of these tests. Without answers, your test suite is unlikely to last for long.

Is there a better option?

Perhaps creating automated UI tests aren't looking like such an appealing option. However, you also still don't want manual regression testing taking so much time, so what other options are there?

Don't try to implement the entire manual regression script

Avoid trying to automate your entire manual test suite. It was written considering manual human testers, not with the awareness of what is difficult or impossible with automation. It is vital that whoever is writing the automated tests has the option to decide not to automate a test case. Be ruthless about culling what you automate. Automating features that are not appropriate to automate will not only take a long time, but result in the flakiest and hardest to maintain tests.

Fill out the rest of the Test Pyramid first

No single type of test should provide complete coverage. You want a variety of tests with varying levels of specificity and isolation — lots of specific, isolated tests at the bottom of the pyramid. Then, as you move up, there are fewer tests that get less specific and cover more parts of the application. Unit tests on the bottom, then integration tests, then end-to-end tests.

Every layer of the pyramid works in concert and has different strengths and weaknesses.
If possible, we would rather cover as much as we can at the unit and integration layers. These tests are easier to write, provide more specific feedback, and can be run during development. Unit tests are better for covering edge cases and error scenarios. Automated UI tests can cover UI logic that, depending on your application, might not be possible to cover with unit tests. Automated UI tests also test that multiple parts of your application work together as expected.

What does your application's pyramid currently look like, and what will it look like after your planned UI automation suite? Is it upside down, or hourglass-shaped? Are you planning to write too many UI automation tests because the rest of the pyramid isn't there?

No one type of test can provide you with complete test coverage. If you already have existing unit and integration tests, you are probably already covering steps of your manual regression test script. There is little value writing complicated automated UI tests to cover something already covered. Rather than replacing your manual regression tests 1-to-1 with automated UI tests, can you replace them with a combination of tests of different types?

Revisit commercial tooling

Revisit why you chose to write to automated UI tests by hand. Are those reasons still correct after taking into account the difficulties of hand-writing automated tests? One of the primary reasons we had dismissed commercial tooling was a concern that it couldn't cover all of our manual test suite. Many months of work later and hand-written UI tests were so slow to write that we hadn't even made a dent in what we had hoped to cover.

Subcutaneous testing

Are you writing UI automation to test your UI, or to facilitate end-to-end tests? If you don't need to test the UI layer, then subcutaneous testing might be a better alternative. This approach lets you perform your end-to-end tests a step below the UI layer. Rather than clicking buttons or filling in text fields, you call event handlers and set public properties directly on your view models. This approach avoids the difficulties of interacting with the UI and of using an automation framework. The disadvantage of this approach is that depending on the technology your application is using, there might not be a lot of specific guidance available. Our application is written in UWP, so we had to figure out for ourselves how to run it from our test framework with the UI mocked out. Once it was working, it proved significantly faster and easier to use than automated UI testing.

Conclusion

The potential benefits of UI automation are exciting, find bugs, free up QA time, eliminate manual regression testing and get feedback to developers during their sprint. However, as with any significant new technology, it is essential to do some investigation up front. Hopefully, the above has provided some questions to ask before you start automating your manual regression test suite by hand. It might not be a good fit, either for the regression bugs you expect to find, the architecture of your application, or who you expect to be writing and maintaining the tests. There are challenges: dealing with unreliable data, a UI that is doing more than you might expect, and flakiness and bad error messages from your automation framework. You need to ask who is going to write the tests, and who is going to champion them when the going gets tough. Finally, have you compared hand-written tests with other commercial products available, with writing more integration and unit tests, or with writing subcutaneous tests?

What has your experiences with writing Automated UI tests been? Perhaps you have some advice for those of us that have been struggling? Let us know in the comments below.

DEV Community