Brian Neville-O'Neill

Posted on Nov 19, 2019 • Originally published at blog.logrocket.com on Nov 15, 2019

Automated testing is not working

#featuredposts #testing #webdev

Written by Paul Cowan✏️

Before I start, I want to point out that I am not referring to one particular project or any particular individual. I believe these problems are industry-wide having spoken to others. Nearly all the automation testers I have worked with have busted a gut to make this faulty machine work. I am hating the game, not the player.

If I am not mistaken, I appear to have awoken in an alternate reality where vast sums of money, time, and resources are allocated to both the writing and the continual maintenance of end-to-end tests. We have a new breed of a developer known as the automation tester whose primary reason for being is not only to find bugs but also to write a regression test to negate the need to do a re-run of the initial manual testing.

Automated regression tests sound great in theory, and anybody starting a new job could not fail to be impressed when finding out that every story in every sprint would have an accompanying end-to-end test written in Selenium webdriver.

I have heard numerous tales of end-to-end tests usually written in selenium webdriver getting deleted due to their brittle nature. Test automation seems to only result in CI build sabotage with non-deterministic tests making change and progression next to impossible. We have test automation engineers too busy or unwilling to carry out manual tests and instead stoking the flames of hell with these underperforming time and resource grasping non-deterministic tests.

Tests that re-run on failure are standard and even provided by some test runners. Some of the most challenging code to write is being written and maintained by the least experienced developers. Test code does not have the same spotlight of scrutiny shone on it. We never stop to ask ourselves whether this insane effort is worth it. We don’t track metrics, and we only ever add more tests.

It is like a bizarre version of Groundhog Day only it is a broken build and not a new day that starts the same series of events. I am now going to list the repeating problems that I see on a project laden with the burden of carrying a massive end-to-end test suite.

Wrong expectations that automated tests will find new defects

At this time of writing, nearly all tests assert their expectations on a fixed set of inputs. Below is a simple login feature file:

Feature: Login Action

Scenario: Successful Login with Valid Credentials

  Given User is on Home Page
  When User Navigate to LogIn Page
  And User enters UserName and Password
  Then Message displayed Login Successfully

The feature file executes the following Java code in what is known as a step definition:

@When("^User enters UserName and Password$")
  public void user_enters_UserName_and_Password() throws Throwable {
  driver.findElement(By.id("log")).sendKeys("testuser_1");
  driver.findElement(By.id("pwd")).sendKeys("Test@123");
  driver.findElement(By.id("login")).click();
 }

This test will only ever find bugs if this finite set of inputs triggers the bug. A new user entering other characters other than testuser_1 and Test@123 won’t be caught by this end-to-end test. We can increase the number of inputs by using a cucumber table:

Given I open Facebook URL
 And fill up the new account form with the following data
 | First Name | Last Name | Phone No | Password | DOB Day | DOB Month | DOB Year | Gender |
 | Test FN | Test LN | 0123123123 | Pass1234 | 01 | Jan | 1990 | Male |

The most likely time that these tests will find bugs is the first time they run. While the above tests or tests still exist, we will have to maintain these tests. If they use selenium webdriver, then we might run into latency problems on our continuous integration pipeline.

These tests can be pushed down the test pyramid onto the unit tests or integration tests.

Don’t drive all testing through the user interface

I am not saying we should do away with end-to-end tests, but if we want to avoid the maintenance of these often brittle tests, then we should only test the happy path. I want a smoke test that lets me know the most crucial functionality is working. Exceptional paths should be handled at a more granular level in the developer unit tests or integration tests.

The most common reason for a bug in the login example is user input. We should not be spinning up selenium to test user input. We can write inexpensive unit tests to check user input that does not require the maintenance overhead of an end-to-end test. We still need one end-to-end test for the happy path just to check it all hangs together, but we don’t need end-to-end tests for the exceptional paths.

Testing can be and should be broken up with most of the burden carried by unit tests and integration tests.

Has everyone forgotten the test pyramid?

Selenium webdriver is not fit for purpose

I have blogged about this previously in my post Cypress.io: the Selenium killer. It is nearly impossible not to write non-deterministic selenium tests because you have to wait for the DOM and the four corners of the cosmos to be perfectly aligned to run your tests.

If you are testing a static webpage with no dynamic content, then selenium is excellent. If however, your website has one or more of these conditions, then you are going to have to contend with flakey or non-deterministic tests:

reads and writes from a database
JavaScript/ajax is used to update the page dynamically,
(JavaScript/CSS) is loaded from a remote server,
CSS or JavaScript is used for animations
JavaScript or a framework such as React/Angular/Vue renders the HTML

An automation tester faced with any of the above conditions will litter their tests with a series of waits, polling waits, checking for ajax calls to have finished, checking for javascript to have loaded, checking for animations to have completed, etc.

The tests turn into an absolute mess and a complete maintenance nightmare. Before you know it, you have test code like this:

click(selector) {
    const el = this.$(selector)
    // make sure element is displayed first
    waitFor(el.waitForDisplayed(2000))
    // this bit waits for element to stop moving (i.e. x/y position is same).
    // Note: I'd probably check width/height in WebdriverIO but not necessary in my use case
    waitFor(
      this.client.executeAsync(function(selector, done) {
        const el = document.querySelector(selector)

        if (!el)
          throw new Error(
            `Couldn't find element even though we .waitForDisplayed it`
          )
        let prevRect
        function checkFinishedAnimating() {
          const nextRect = el.getBoundingClientRect()
          // if it's not the first run (i.e. no prevRect yet) and the position is the same, anim
          // is finished so call done()
          if (
            prevRect != null &&
            prevRect.x === nextRect.x &&
            prevRect.y === nextRect.y
          ) {
            done()
          } else {
            // Otherwise, set the prevRect and wait 100ms to do another check.
            // We can play with what amount of wait works best. Probably less than 100ms.
            prevRect = nextRect
            setTimeout(checkFinishedAnimating, 100)
          }
        }
        checkFinishedAnimating()
      }, selector)
    )
    // then click
    waitFor(el.click())
    return this;
  }

My eyes water looking at this code. How can this be anything but one big massive flake and that takes time and effort to keep this monster alive?

Cypress.io gets around this by embedding itself in the browser and executing in the same event loop as the browser and code executes synchronously. Taking the asynchronicity and not having to resort to polling, sleeping, and waiting for helpers is hugely empowering.

The effectiveness of tests is not tracked and we don’t delete bad tests

Test automation engineers are very possessive about their tests, and in my experience, we don’t do any work to identify whether a test is paying its way.

We need tooling that monitors the flakiness of tests, and if the flakiness is too high, it automatically quarantines the test. Quarantining removes the test from the critical path and files a bug for developers to reduce the flakiness.

Eradicate all non-deterministic tests from the face of the planet

If re-running the build is the solution to fixing a test, then that test needs to be deleted. Once developers get into the mindset of pressing the build again button, then all faith in the test suite has gone.

Re-running the tests on failure is a sign of utter failure

The test runner courgette can be disgracefully configured to re-run on a fail:

@RunWith(Courgette.class)=
 @CourgetteOptions(
  threads = 1,
  runLevel = CourgetteRunLevel.FEATURE,
  rerunFailedScenarios = true,
  showTestOutput = true,
  ))

 public class TestRunner {
 }

What is being said by rerunFailedScenarios = true is that our tests are non-deterministic, but we don’t care, we are just going to re-run them because hopefully next time they will work. I take this as an admission of guilt. Current test automation thinking has deemed this acceptable behavior.

If your test is non-deterministic, i.e. it has different behavior when running with the same inputs, then delete it. Non-deterministic tests can drain the confidence of your project. If your developers are pressing the magic button without thinking, then you have reached this point. Delete these tests and start again.

Maintenance of end-to-end tests comes at a high price

Test maintenance has been the death of many test automation initiatives. When it takes more effort to update the tests than it would take to re-run them manually, test automation will be abandoned. Your test automation initiative should not fall victim to high maintenance costs.

There’s a lot more to testing than simply executing and reporting. Environment setup, test design, strategy, test data, are often forgotten. You can watch your monthly invoice skyrocket from your cloud provider of choice as the number of resources required to run this every expanding test suite.

Automation test code should be treated as production code

Automation testers are often new to development and are suddenly tasked with writing complicated end-to-end tests in selenium webdriver, and as such, they need to do the following:

Don’t copy and paste code. Copy and pasted code takes on a life of its own and must never happen. I see this a lot
Don’t set up test code through the user interface. I have seen this many times, and you end up with bloated tests that re-run the same test setup code many times to reach the point of writing more test code for a new scenario. Tests need to be independent and repeatable. The seeding or initialization of each new feature should take place through scripting or outside of the test
Don’t use Thread.sleep and other hacks. A puppy dies in heaven every time an automation tester uses Thread.sleep with some arbitrary number in the futile hope that after x milliseconds the world will be as they expect. Failure is the only result of using Thread.sleep

Automation test code needs to come under the same scrutiny as real code. These difficult to write test scenarios should not be a sea of copy and paste hacks to reach the finish point.

Testers no longer want to test

I have some sympathy with this point, but manual testing is not as compelling as writing code, so manual testing is perceived as outdated and boring. Automation tests should be written after the manual testing to catch regressions. A lot of automation testers that I have worked with do not like manual testing anymore, and it is falling by the wayside. Manual testing will catch many more bugs than writing one test with one fixed set of inputs.

It is often commonplace now to write Gherkin syntax on a brand new ticket or story and go straight into writing the feature file and step definition. If this happens, then, manual testing is bypassed, and a regression test is written before the actual regression has happened. We are writing a test for a bug that will probably never happen.

Conclusion

In my estimation, we are spending vast sums of money and resources on something that’s just not working. The only good result that I have seen from automated testing is an insanely long build, and we have made change exceptionally difficult.

We are not sensible about automated testing. It sounds great in principle. Still, there are so many bear traps that we can quickly end up in a dead-end where change is excruciating and difficult to maintain tests are kept alive for no good reason.

I will leave you with these questions that I think need to be answered:

Why is nobody questioning if the payback is worth the effort?
Why are we allowing flakey tests to be the norm, not the exception?
Why is re-running a test with the same inputs and getting a different result excusable to the point where we have runners such as courgette that do this automatically?
Why is selenium the norm when it is not fit for purpose?
Why are developers still resorting to a sea of waits, polling waits, and at worst Thread.sleep code in their rush to complete the task? This is the root of the flake.

Editor's note: Seeing something wrong with this post? You can find the correct version here.

Plug: LogRocket, a DVR for web apps

LogRocket is a frontend logging tool that lets you replay problems as if they happened in your own browser. Instead of guessing why errors happen, or asking users for screenshots and log dumps, LogRocket lets you replay the session to quickly understand what went wrong. It works perfectly with any app, regardless of framework, and has plugins to log additional context from Redux, Vuex, and @ngrx/store.

In addition to logging Redux actions and state, LogRocket records console logs, JavaScript errors, stacktraces, network requests/responses with headers + bodies, browser metadata, and custom logs. It also instruments the DOM to record the HTML and CSS on the page, recreating pixel-perfect videos of even the most complex single-page apps.

Try it for free.

The post Automation testing is not working appeared first on LogRocket Blog.

Top comments (3)

Artur Neumann • Nov 30 '19

e2e tests are not there to find new bugs but to prevent a dev to introduce a regression. So the finite input values do make sense, to prevent the next commit to break something.

Its not possible to test every corner-case in an UI test, so you rightly mention the test-pyramid. If your higer-level test found a bug, probably you haven't done enough testing on the lower levels.

I made good experience with e2e tests preventing to introduce regressions and even catching bugs in upstream projects, because they did not test enough.

main reason for selenium vs. cypress:
"Many browsers such as Firefox, Safari, and Internet Explorer are not currently supported."

But what you are right with is, how to measure the effectiveness of a test-suite? I don't have an answer to that

gholden • Nov 19 '19

who tests the testers? I asked this once when discussing Automated tests - asking how the code was validated? How was the tests tested? - to make sure they're actually fit for purpose.

People forget, that automated tests are application code that needs maintaining. This is an added expense. They need to be used sensibly.

Jacqueline Binya • Nov 20 '19

Hey Brian, brilliant article as always.
I will deflect a bit, but how does one get to write for @LogRocket?

DEV Community