End to End testing with Selenium — Retrospective

#automation #testing #selenium #qa

Original article here: End to End testing with Selenium — Retrospective

I was working on the QA automation team at Transparent Language for roughly a year. I helped establish the team, as well as develop and mature their infrastructure, tools, and training processes. The following is an outlook on my greatest challenges and solutions to common misconceptions regarding testing with Selenium as well as common problems with infrastructure.

What is Selenium?

Selenium is software testing framework commonly used for developing unit tests for web applications, end to end tests, or automating otherwise redundant tasks against websites or web applications. Selenium was originally developed in-house by developers over at Google, and later polished and released by Thoughtworks as the most well known and used implementation "Selenium Webdriver". The idea behind Webdriver is quite simple; an official api is maintained by the Selenium team, you download bindings for your favorite programming language (C#, Python, Java, etc.), you acquire a webdriver for the browser you would like to test against, and finally you have your selenium bindings talk to your webdriver which acts as a marionette for your target browser.

The concept is quite clever, and Selenium allows developers to adequately test their applications with ease against any and all browsers. Naturally, our development teams did utilize Selenium for basic unit tests, but we had numerous products that spoke with each other, and tasking our developers with writing true end to end tests while simultaneously developing new features and bug fixing didn't make sense; especially because we had multiple development teams for our in house projects. The time investment for learning how different components reacted with each other and creating all new tests that run against these assumptions would hinder further growth. Our solution was to create a dedicated team to develop all encompassing tests.

First hurdle - writing reusable code

My team and I decided to write our tests with Python, as this was the dominant programming language being used by our company for web services. My first hurdle when I began working with Selenium was designing clean, and re-usable code. It's "standard practice" to follow the Page Object Model when leveraging Selenium, regardless of bindings being used, and that makes sense. We began constructing generic objects for each page of our target web application that contained simple getters to find DOM elements and construct selenium WebElement instances. This greatly helped reduce our lead time for developing new tests as we could now simply instantiate defined page objects and leverage the robust selenium bindings.

class HomePage:
  def __init__(self, webdriver):
    self.wd = webdriver
  @property
  def notification_box(self):
    return self.wd.find_element_by_css_selector('div.notif')

Soon after establishing our page object methodology however, we ran into a brick wall. We were developing a new suite of tests for a new application, B, but to access this application we had navigate through the application we'd previously written tests for, A. To solve this we began creating client packages for each product we tested against, in which we would export to be consumed by other projects that required definitions for foreign web pages.

from unittest import TestCase
from selenium import webdriver
from products.barfoo.pages import UserPage
from pages.foobar import HomePage
class AppTests(TestCase):
  def setUp(self):
    self.webdriver = webdriver.Chrome()
  def tearDown(self):
    self.webdriver.exit()

  def test_user_page(self):
    home_page = HomePage(self.webdriver)
    user_page = UserPage(self.webdriver)
    home_page.user_profile.click()
    self.assertEqual(user_page.username.text, "john")

Growing pains

As my team picked up the pace, and our tests became more intricate and complex, we realized our setup made state management quite tricky. Our page objects became bloated with routine functionality ie; creating a new user, logging in, etc. We also had to monkey patch a lot of the core selenium bindings to work uniformly with different browsers due to discrepancies in adherence to the selenium webdriver specification between webdriver maintainers. This led to inconsistencies across the board with all of our projects. We had to design a leaner codebase, a framework for designing our client packages and test suites.

My initial approach was to create another layer of abstraction. I called this the PMC model (Page, Model, Controller).

An incredibly simple concept. Pages and Modals would be separate entities that only define elements, any auxiliary functionality or stateful logic will be handled by a Controller. Each product will have it's own controller that serves as a marionette for it's attributed project, allows us to directly reference it's attributed page or modal entities. When a new controller is instantiated, it will trickle down a webdriver instance as well as any configurations to it's child pages or modals. This webdriver instance will automatically be patched to work uniformly across any browser.

The PMC model helped up develop tests faster, keep our business logic and web application models separated, and simplified our client packages. Instead of having to import and instantiate n arbitrary page objects, we now only had to import and instantiate a single controller.

Note: Our PMC model was later replaced by a framework I'd developed explicitly for selenium testing with Python, py-component-controller.

Infrastructure

As our test suites grew exponentially larger, we needed a leaner build pipeline. We were using Bamboo as our CD/CI tool of choice (later Concourse to promote infrastructure as code), but our tests were bogging down our Bamboo agents and cleaning up after ourselves proved to be much more tedious than expected. We also had to ensure our Bamboo agents were provisioned with our specific version of Python and other modules and services to run as intended. We resolved this by leveraging containers, namely Docker. Using Docker, we were able to ensure that our tests ran inside an isolated environment we had complete control over, and cleaning up was as simple as deleting containers after they had served their purpose.

Testing against multiple browsers also became increasing difficult because we had decided to leverage the Selenium Grid, and host our grid internally as opposed to using services such as Sauce Labs which offers scale-able Selenium infrastructures. We ran into a myriad of network related issues, and the manual intervention whenever an operating system or browser had an update was a drastic time investment. After coming to the conclusion that there was virtually no difference between testing against a virtualized browser versus a baremetal machine, we eventually made the decision to explicitly test against only Chrome and Firefox, which allowed us to ditch our grid setup and run tests even faster than we ever could have. We again leveraged Docker to run both Chrome and Firefox headlessly within a virtual container.

One other large challenge was ensuring consistency between both development and "production" test environments. To this effect, we used vagrant to ensure each developer had a predictable development environment to resolve issues as quickly as possible. We also leveraged Ansible to help automate redundant tasks such as installing or updating specific browsers or webdrivers.

Decreasing lead time

With our infrastructure being almost entirely automated, and our code base adhering to proper standards with a single framework for developing tests - after months of maturing our processes, we had presumed our lead time would have dived significantly. We were wrong. Though these factors did help, and were ultimately the reason why we were able to reach a state where our tests and results could prove useful, a major benefactor to ensuring a smooth lead time from newly added feature to test in production was properly understanding the scope of our end to end tests. After a certain point, we began testing anything and everything we possibly could considering how simple and streamline it was to toss a new test into our existing suites. We had strayed away from the "happy path", or testing regular user interactions and began creating even more complex tests for edge cases that could and would otherwise be discovered by manual QA.

We were using a waterfall approach, which had helped us develop our infrastructure and processes, but this also allowed for malleable tangents into unimportant features which would result in wasted time investments by over engineering. The move to an agile workflow greatly helped structure and stabilize the team once we reached MVP. It can be quite easy to be carried away when trying to establish projects like automated testing because you must figure out what does and doesn't work, as well as the right tools to use. However it's integral for overall productivity and value to understand your project, however intricate and robust, is at the end of the day simply a suite of tests that will not be used or consumed by an end user. It is a tool to simply validate business logic and performance.

Maintenance and cost

Once we reached MVP, maintaining our tests became quite simple. We had triggers to run tests against our internal test sites, as well as our live sites. Our tests would run on non-office hours, to give us feedback as soon as we entered the office. We also leveraged git web hooks in our build pipeline to trigger any particular test cases affected by any code base changes, which helped maintain a fast feed forward loop. Utilizing Atlassian's Jira, the team was also able to coordinate with other development teams to ensure any breaking or major changes were promptly responded to.

Given our infrastructure, our only bottleneck was our given resources for running our test containers. Our containers ran both Chrome and Firefox, which made them rather CPU intensive - however, maintaining a single EC2 instance to run our tests on was much more cost efficient than relying on services like SauceLabs which can become quite pricey when running tests in parallel.

Key takeaways

Summarizing this post, the following is what I learned while on the automation team at Transparent Language:

Automated tests are code. Properly maintaining large scale end to end tests requires DevOps practices, and a range of skills from basic web development to SysOps. Downplaying automated QA will lead to mediocre products, and can ultimately be a loss of revenue for your company if you spend months developing infrastructure that can't scale or offer you insightful feedback in a timely manner.
The payoff is there, but it may not always be necessary. Automated end to end testing can be an incredibly useful resource. It can provide instant feedback when your application has been either deployed or updated. However, there is a significant time investment to handle this correctly. You must be prepared to not receive valuable feedback for months depending on the size of your given product(s) you're testing against.
Automating QA does not mean manual QA is not required or shouldn't be preformed. As valuable as automated quality assurance can be, it can't account for use cases that you haven't explicitly defined. When new features are introduced into applications,